Missing data are ubiquitous in aging studies. Combining the National Health and Nutrition Examination Survey (NHANES) 2003/2004 and 2005/2006 cross-sectional aging studies (N = 9307), we investigated the effects of both… Click to show full abstract
Missing data are ubiquitous in aging studies. Combining the National Health and Nutrition Examination Survey (NHANES) 2003/2004 and 2005/2006 cross-sectional aging studies (N = 9307), we investigated the effects of both real and simulated missing data on the Frailty Index (FI) and survival analysis, along with several mitigation strategies. We observed distinct block patterns of missing variables in the dataset. These blocks showed significant hazard rate (HR) differences when they were missing versus present, indicating that missingness cannot be simply ignored. Simulations of this patterned missingness produced a bias of 0.0112 ± 0.0008 to the mean FI when missing values were ignored, representing a change in hazard of 1.09 ± 0.01. A similar bias of 0.0106 ± 0.0001 was estimated in the real missingness. Imputation was able to correct the bias using the multivariate imputation by chained equations (MICE) method via the classification and regression tree (CART) prediction model together with rule-based imputation. Using auxiliary variables (CART+Aux) improved the performance of CART. Well-performing imputation models, especially CART+Aux, were able to increase the FI predictive power and the reliability of the HR estimates. In contrast, the default MICE models, predictive mean matching/logistic regression (PMM/logreg), caused even stronger biases to the FI. Our results demonstrate that calibration of the FI as a mortality predictor depends on how missing data are handled. Ignoring missing values when calculating the FI may be an acceptable strategy for clinical settings where the FI is used as a rough predictor of adverse outcomes. Where the FI is to be compared across studies or populations, judicious imputation - cognizant of the risks carried by poor imputation - should be used to ensure reliability and precision of statistical estimates and conclusions.
               
Click one of the above tabs to view related content.