“We are what we eat.” Raised on a soup of Kempthorne's Chapter 7 (which laid out the basis for inference from randomized clinical trials (RCTs)), Cochran's eminent practicality (don't let… Click to show full abstract
“We are what we eat.” Raised on a soup of Kempthorne's Chapter 7 (which laid out the basis for inference from randomized clinical trials (RCTs)), Cochran's eminent practicality (don't let arcane theory prevent you from looking at actual data) in his text and classes, and Cornfield's remarkable ability to experiment with and rethink his views, I approach the analysis of RCTs in a somewhat different spirit from that expressed by the provocative paper by Rosenberger, Uschner, and Wang (RUW). They and I start on the same page—we all accept the view that their Figure 2, the randomization model, presents the appropriate structure of inference from RCTs, while the frequently applied “invoked model” as depicted in the second panel of their Figure 1 does not. The randomization model views an RCT as a self‐contained entity: it allows formal inference only into itself; inference onto a wider population requires a leap of biological, medical, and sociological faith (much to the consternation of many survey samplers). Often expressed in terms of the conflict between internal and external validity, this fundamental duality informs the interpretation of results of RCTs: to whom the results should apply requires judgment by physicians, regulators, payers, and patients. If the results were relevant only to participants in the trial, neither the private nor the public sector would spend hundreds of millions of dollars developing a drug or device—the conviction that the results are relevant outside the narrow confines of the trial population drives development of new therapies. The question on the table, then, is not which model is correct, but how actually to analyze data from RCTs. RUW argue that in earlier decades, statisticians knew randomization tests represented the correct approach to analysis, but their lack of access to fast computers forced use of likelihood‐based methods. Now, RUW contend, the situation is reversed: we have easy access to high powered computers, but we statisticians have lost our way—we no longer appreciate the need for randomization tests. RUW are closer to knowing how students are being taught than I am—it may well be true that the crucial distinction between internal and external validity is no longer central to the training of statisticians (if that is correct, some aspects statistical education needs to revert back to earlier times). Where we part is concluding that we should therefore be replacing our current reliance on model‐based inference with randomization tests. Back to Kempthorne's Chapter 7 which teaches that tests based on normal theory generally produce excellent approximations to true randomization tests. My own take‐away from that lesson was to fear ordinary statistical tests in RCTs only when some statistical aspect of the trial deviated markedly from the typical case. I have been involved in three trials whose properties were so atypical that I felt the need to perform randomization or permutation tests. The first was a trial of motexafin gadolinum in patients with brain cancer. Because the trial was open label, those of us involved in its design were concerned that use of conventionally sized small blocks within centers would allow investigators to deduce the size of the blocks and therefore gain the potential to manipulate assignment of patients to the experimental or control arm. We therefore used an urn model to allocate assignments and, because of our uncertainty about the applicability of the theory that defended the use of model‐based statistics, we performed a rerandomization test. The second case involved a study of fish oil for reduction of triglycerides (TG) among people whose levels were above 500 mg/dL. Here, the problem was the extreme skewness of TG levels. Rather than using medians or transformations, we decided to see how a simple analysis of covariance of level of TG with baseline TG as a covariate, stratified by the randomization strata, would compare to the same analysis based on a randomization distribution. The third case involved a Phase 3 trial of voretigene neparvovec, a gene therapy for patients with biallelic RPE65 mutation‐ associated retinal dystrophy. The study, which used a novel outcome with a new scoring system, randomized only 21 participants to gene therapy and 10 to placebo. The small sample size, coupled with our uncertainty about the distributional properties of the scoring system, led us to do an actual permutation test. Because of the small sample size, we were able to enumerate all possible permutations of treatment assignments and calculate the permutation‐test P‐value. In all three
               
Click one of the above tabs to view related content.