"Statistical significance comparing cricothyroidotomy techniques"

Yeow et al. found a median one-second difference in time to insertion of a tracheal tube with successful ventilation using a novel cricothroidotomy introducer compared with scalpel cricothroidotomy (85 s vs. 84 s, respectively) [1]. However, despite the novel introducer having the greater median, the authors concluded its insertion was faster, with a p value (Wilcoxon’s Signed Ranks (WSR) test) of 0.030. In fact, the study was powered to demonstrate whether the insertion times for a pair of techniques differed by more than a threshold value of 45 s, based on a previous study [2]. This calculation is based on the assumption of a normal distribution, but the authors noted that their data were not normally distributed. In addition, their groups were paired rather than randomised, further invalidating the assumptions underlying their power calculation. Re-analysing their data appropriately by comparing a difference between means (assuming these are similar to the medians) of one second would not have yielded a statistically significant result. In effect, the statistical significance reported was disconnected from the authors’ original power calculation. Based on their findings, a future study would require 200,992 participants across both groups, to detect a statistically significant one-second difference betweenmeans (80%power, a error 0.05). Regarding their use of the WSR test, it is unclear what their null hypothesis was, and, therefore, to what measure the p value related. The only indication from the paper is that the p value appeared to relate to a comparison of the medians, as it is noted immediately following the comparison of medians (IQR [range]). However, a WSR test does not directly compare the original distributions’ respective medians. It analyses the median of differences between the pairs, and whether these are symmetrically distributed around zero or not. Not only is the WSR test tenuously related to a test of medians, the lack of a summary statistic in highlighting the comparison between the groups to which the p value pertains is noted to be a difficulty with its use [3]. Additionally, it has previously been demonstrated that, under a null hypothesis of a median of differences of zero, depending on the absolute value of the skewness of the distribution of differences, the type-1 error rate is often greater than the nominal 0.05 [4]. Kasuya notes that the test should not be used when the distribution of differences is asymmetric, as it loses its validity by inflating the rate of type-1 error, that is, inappropriately rejecting the null hypothesis. These issues with the use of this test are worth highlighting further. TheWSR testmay yield a p value < 0.05 under three different circumstances [5]: (1) the median of differences within a symmetric distribution is not zero; (2) the median of differences within an asymmetric distribution is not zero; and (3) the median of differences within an asymmetric distribution is zero. Concerning this study, although the distribution of the differences in pairs is not presented, it is likely to be asymmetric, given the fact that the parent distributions appear to have different levels of positive skewness, based on the box plots. Therefore, circumstances 2 or 3 were the likely candidates for rejection of the null hypothesis upon reaching a p value < 0.05, that is, a significant result was unlikely to be related to a location shift of the median of differences alone, if at all, while likely being related (perhaps exclusively) to an asymmetric distribution of differences due to underlying differences in skewness (with questionable clinical relevance). Although circumstance 2 would still suggest that the median of differences was not zero, such a finding was not commented upon. With regard to the original study design, such a small analysis could potentially have been undertaken with unpaired, randomised groups. It may also have been possible, as results from a previous study were available, that a transformation (e.g. a logarithmic transformation, particularly as these data appear to have a positive skew as per the box plot) could have been performed in order to render the data normally distributed. Both of the above points are pertinent to utilising parametric tests. This would have allowed a direct comparison of central tendency in each group, fulfilling the purpose for which the original power calculation was performed. Such a comparison of means would have allowed for easier summary of outcomes, and, arguably, improved reader understanding with respect to its clinical relevance. In light of this discussion, the clinical relevance of the study’s primary outcome comparison is unclear, particularly as the statistical analysis does not provide a clear summary statistic or null hypothesis towhich the p value relates. At the very least, it is clearly unrelated to the original medians, which, contrary to the conclusions from the WSR p value,

Keywords: median differences; null hypothesis; study; value; test; distribution

Journal Title: Anaesthesia
Year Published: 2019

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
0

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended