This paper aims to assess the interrater reliability of standardized patients (SPs) as they assess the clinical skills of medical students and to detect possible rating bias in SPs. The… Click to show full abstract
This paper aims to assess the interrater reliability of standardized patients (SPs) as they assess the clinical skills of medical students and to detect possible rating bias in SPs. The ratings received by 6 students examined in 4 clinical stations by 13 SPs were examined. Each SP contributed at least 3 and at most 10 pairwise ratings, with an average of approximately 5 ratings per SP. The standard Cohen’ kappa statistic was calculated and the distribution of scores among SPs was compared via both ANOVA the Kruskal-Wallis H test (one-way ANOVA by ranks). Furthermore, a number of discrepancies between pairwise raters (showing either “positive” or “negative” bias in the rating) were analyzed using ANOVA and a χ2 goodness-of-fit test. The conventional method, which compared the statistics of kappa scores of the raters (including the prevalence-adjusted bias-adjusted kappa scores), did not reject the null hypothesis that the raters (SPs) are similar. However, the analysis of the distribution of the discrepancies among the raters revealed that the differences between raters cannot be attributed to chance, particularly when a distinction was made between their overall positive and negative bias. A strong (p < 0.001) negative bias was detected, and the SPs responsible for this bias have been identified. The statistical method suggested here, which takes into account explicitly the positive and the negative bias of the raters, is more sensitive than the conventional method (Cohens’ kappa). Since the outliers (the biased SPs) affect the fairness of the grading of the medical students, it is important to detect any statistically significant bias in the rating and to adjust correspondingly the SP’s assessment.
               
Click one of the above tabs to view related content.