Background Significant differences exist in the classification outcomes for radiologists using ultrasonography-based Breast Imaging Reporting and Data Systems for diagnosing category 3–5 (BI-RADS 3–5) breast nodules, due to a lack… Click to show full abstract
Background Significant differences exist in the classification outcomes for radiologists using ultrasonography-based Breast Imaging Reporting and Data Systems for diagnosing category 3–5 (BI-RADS 3–5) breast nodules, due to a lack of clear and distinguishing image features. Consequently, this retrospective study investigated the improvement of BI-RADS 3–5 classification consistency using a transformer-based computer-aided diagnosis (CAD) model. Methods Independently, 5 radiologists performed BI-RADS annotations on 21,332 breast ultrasonographic images collected from 3,978 female patients from 20 clinical centers in China. All images were divided into training, validation, testing, and sampling sets. The trained transformer-based CAD model was then used to classify test images, for which sensitivity (SEN), specificity (SPE), accuracy (ACC), area under the curve (AUC), and calibration curve were evaluated. Variations in these metrics among the 5 radiologists were analyzed by referencing BI-RADS classification results for the sampling test set provided by CAD to determine whether classification consistency (the k value), SEN, SPE, and ACC could be improved. Results After the training set (11,238 images) and validation set (2,996 images) were learned by the CAD model, the classification ACC of the CAD model applied to the test set (7,098 images) was 94.89% in category 3, 96.90% in category 4A, 95.49% in category 4B, 92.28% in category 4C, and 95.45% in category 5 nodules. Based on pathological results, the AUC of the CAD model was 0.924 and the predicted probability of CAD was a little higher than the actual probability in the calibration curve. After referencing BI-RADS classification results, the adjustments were made to 1,583 nodules, of which 905 were classified to a lower category and 678 to a higher category in the sampling test set. As a result, the ACC (72.41–82.65%), SEN (32.73–56.98%), and SPE (82.46–89.26%) of the classification by each radiologist were significantly improved on average, with the consistency (k values) in almost all of them increasing to >0.6. Conclusions The radiologist’s classification consistency was markedly improved with almost all the k values increasing by a value greater than 0.6, and the diagnostic efficiency was also improved by approximately 24% (32.73% to 56.98%) and 7% (82.46% to 89.26%) for SEN and SPE, respectively, of the total classification on average. The transformer-based CAD model can help to improve the radiologist’s diagnostic efficacy and consistency with others in the classification of BI-RADS 3–5 nodules.
               
Click one of the above tabs to view related content.