Background Ultrasound (US) is a valuable technique to detect degenerative findings and intrasubstance tears in lateral elbow tendinopathy (LET). Machine learning methods allow supporting this radiological diagnosis. Aim To assess… Click to show full abstract
Background Ultrasound (US) is a valuable technique to detect degenerative findings and intrasubstance tears in lateral elbow tendinopathy (LET). Machine learning methods allow supporting this radiological diagnosis. Aim To assess multilabel classification models using machine learning models to detect degenerative findings and intrasubstance tears in US images with LET diagnosis. Materials and methods A retrospective study was performed. US images and medical records from patients with LET diagnosis from January 1st, 2017, to December 30th, 2018, were selected. Datasets were built for training and testing models. For image analysis, features extraction, texture characteristics, intensity distribution, pixel-pixel co-occurrence patterns, and scales granularity were implemented. Six different supervised learning models were implemented for binary and multilabel classification. All models were trained to classify four tendon findings (hypoechogenicity, neovascularity, enthesopathy, and intrasubstance tear). Accuracy indicators and their confidence intervals (CI) were obtained for all models following a K-fold-repeated-cross-validation method. To measure multilabel prediction, multilabel accuracy, sensitivity, specificity, and receiver operating characteristic (ROC) with 95% CI were used. Results A total of 30,007 US images (4,324 exams, 2,917 patients) were included in the analysis. The RF model presented the highest mean values in the area under the curve (AUC), sensitivity, and also specificity by each degenerative finding in the binary classification. The AUC and sensitivity showed the best performance in intrasubstance tear with 0.991 [95% CI, 099, 0.99], and 0.775 [95% CI, 0.77, 0.77], respectively. Instead, specificity showed upper values in hypoechogenicity with 0.821 [95% CI, 0.82, −0.82]. In the multilabel classifier, RF also presented the highest performance. The accuracy was 0.772 [95% CI, 0.771, 0.773], a great macro of 0.948 [95% CI, 0.94, 0.94], and a micro of 0.962 [95% CI, 0.96, 0.96] AUC scores were detected. Diagnostic accuracy, sensitivity, and specificity with 95% CI were calculated. Conclusion Machine learning algorithms based on US images with LET presented high diagnosis accuracy. Mainly the random forest model shows the best performance in binary and multilabel classifiers, particularly for intrasubstance tears.
               
Click one of the above tabs to view related content.