e42 | www.epidem.com © 2020 Wolters Kluwer Health, Inc. All rights reserved. To the Editor: Machine learning techniques may improve risk prediction and disease screening. Class imbalance (ratio of noncases… Click to show full abstract
e42 | www.epidem.com © 2020 Wolters Kluwer Health, Inc. All rights reserved. To the Editor: Machine learning techniques may improve risk prediction and disease screening. Class imbalance (ratio of noncases to cases > 1) routinely occurs in epidemiologic data and may degrade the predictive performance of machine learning algorithms.1–4 Of the many techniques developed to address class imbalance,5,6 here, we investigated simple undersampling. This method is straightforward and accessible, but evidence on its performance is mixed and practical guidance is needed. Using simulated data, we assessed the predictive performance of the ensemble machine learning algorithm SuperLearner and logistic regression in imbalanced and undersampled data to investigate whether undersampling alters predictive accuracy.
               
Click one of the above tabs to view related content.