Background: It is unclear whether machine learning methods yield more accurate electronic health record (EHR) prediction models compared with traditional regression methods. Objective: The objective of this study was to… Click to show full abstract
Background: It is unclear whether machine learning methods yield more accurate electronic health record (EHR) prediction models compared with traditional regression methods. Objective: The objective of this study was to compare machine learning and traditional regression models for 10-year mortality prediction using EHR data. Design: This was a cohort study. Setting: Veterans Affairs (VA) EHR data. Participants: Veterans age above 50 with a primary care visit in 2005, divided into separate training and testing cohorts (n= 124,360 each). Measurements and Analytic Methods: The primary outcome was 10-year all-cause mortality. We considered 924 potential predictors across a wide range of EHR data elements including demographics (3), vital signs (9), medication classes (399), disease diagnoses (293), laboratory results (71), and health care utilization (149). We compared discrimination (c-statistics), calibration metrics, and diagnostic test characteristics (sensitivity, specificity, and positive and negative predictive values) of machine learning and regression models. Results: Our cohort mean age (SD) was 68.2 (10.5), 93.9% were male; 39.4% died within 10 years. Models yielded testing cohort c-statistics between 0.827 and 0.837. Utilizing all 924 predictors, the Gradient Boosting model yielded the highest c-statistic [0.837, 95% confidence interval (CI): 0.835–0.839]. The full (unselected) logistic regression model had the highest c-statistic of regression models (0.833, 95% CI: 0.830–0.835) but showed evidence of overfitting. The discrimination of the stepwise selection logistic model (101 predictors) was similar (0.832, 95% CI: 0.830–0.834) with minimal overfitting. All models were well-calibrated and had similar diagnostic test characteristics. Limitation: Our results should be confirmed in non-VA EHRs. Conclusion: The differences in c-statistic between the best machine learning model (924-predictor Gradient Boosting) and 101-predictor stepwise logistic models for 10-year mortality prediction were modest, suggesting stepwise regression methods continue to be a reasonable method for VA EHR mortality prediction model development.
               
Click one of the above tabs to view related content.