To the Editor: We are writing to comment on the article entitled “Intraand Inter-Rater Agreement Describing Myometrial Lesions Using Morphologic Uterus Sonographic Assessment: A Pilot Study” published by Rasmussen et… Click to show full abstract
To the Editor: We are writing to comment on the article entitled “Intraand Inter-Rater Agreement Describing Myometrial Lesions Using Morphologic Uterus Sonographic Assessment: A Pilot Study” published by Rasmussen et al in the October 2019 issue of the Journal of Ultrasound in Medicine. The authors aimed to evaluate the intraand inter-rater agreement for myometrial lesions using morphologic uterus sonographic assessment terminology. That pilot study included 13 raters from 10 different hospitals in Europe and the United States. In this study, 13 raters with high (n = 6) or medium experience (n = 7) assessed 30 threedimensional ultrasound clips with (n = 20) and without (n = 10) benign myometrial lesions. Myometrial lesions were reported as poorly or well defined and then systematically evaluated for the presence of individual features. Then, intraand interrater agreements were calculated with κ statistics. In the results of the article, the reporting of poorly defined lesions reached moderate intra-rater agreement (κ = 0.49 [high experience] and 0.47 [medium experience]) and poor inter-rater agreement (κ = 0.39 [high experience] and 0.25 [medium experience]). The reporting of well-defined lesions reached good to very good intra-rater agreement (κ = 0.73 [high experience] and 0.82 [medium experience]) and good inter-rater agreement (κ = 0.75 [high experience] and 0.63 [medium experience]). Although this article has provided valuable information, there are some substantial points that by considering them can help in the clarity of the method and an accurate interpretation of the study. Paying attention to myometrial lesion reports, regardless of the prevalence of myometrial lesions and the lack of attention to κ restrictions, will lead to conflicting results, as with grouping the raters in this study. Regarding reliability, for qualitative variables with more than 2 categories, using the simple κ is among common mistakes because κ has its own limitations: First, it depends on the prevalence in each category. Second, it also depends on the number of categories. We should mention that when a variable with more than 2 categories or an ordinal scale is used (with ≥3 ordered categories), then the weighted κ would be a good choice. Finally, another important flaw is when the raters have unequal marginal distributions of their responses. To assess inter-rater reliability, we suggest applying the Fliess κ as an appropriate test. However, they did not use any of the commonly used statistical tests (Fliess or weighted κ) to assess the reliability in this study. To make it brief, for assessing reliability, reporting the κ coefficient and relying on it can lead to a misleading message.
               
Click one of the above tabs to view related content.