LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Effect of Training Data Volume on Performance of Convolutional Neural Network Pneumothorax Classifiers

Photo from wikipedia

Large datasets with high-quality labels required to train deep neural networks are challenging to obtain in the radiology domain. This work investigates the effect of training dataset size on the… Click to show full abstract

Large datasets with high-quality labels required to train deep neural networks are challenging to obtain in the radiology domain. This work investigates the effect of training dataset size on the performance of deep learning classifiers, focusing on chest radiograph pneumothorax detection as a proxy visual task in the radiology domain. Two open-source datasets (ChestX-ray14 and CheXpert) comprising 291,454 images were merged and convolutional neural networks trained with stepwise increase in training dataset sizes. Model iterations at each dataset volume were evaluated on an external test set of 525 emergency department chest radiographs. Learning curve analysis was performed to fit the observed AUCs for all models generated. For all three network architectures tested, model AUCs and accuracy increased rapidly from 2 × 103 to 20 × 103 training samples, with more gradual increase until the maximum training dataset size of 291 × 103 images. AUCs for models trained with the maximum tested dataset size of 291 × 103 images were significantly higher than models trained with 20 × 103 images: ResNet-50: AUC20k = 0.86, AUC291k = 0.95, p < 0.001; DenseNet-121 AUC20k = 0.85, AUC291k = 0.93, p < 0.001; EfficientNet AUC20k = 0.92, AUC 291 k = 0.98, p < 0.001. Our study established learning curves describing the relationship between dataset training size and model performance of deep learning convolutional neural networks applied to a typical radiology binary classification task. These curves suggest a point of diminishing performance returns for increasing training data volumes, which algorithm developers should consider given the high costs of obtaining and labelling radiology data.

Keywords: effect training; radiology; training data; performance; convolutional neural; training

Journal Title: Journal of Digital Imaging
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.