LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

The impact of undersampling on the predictive performance of logistic regression and machine learning algorithms: A simulation study.

Photo from wikipedia

e42 | www.epidem.com © 2020 Wolters Kluwer Health, Inc. All rights reserved. To the Editor: Machine learning techniques may improve risk prediction and disease screening. Class imbalance (ratio of noncases… Click to show full abstract

e42 | www.epidem.com © 2020 Wolters Kluwer Health, Inc. All rights reserved. To the Editor: Machine learning techniques may improve risk prediction and disease screening. Class imbalance (ratio of noncases to cases > 1) routinely occurs in epidemiologic data and may degrade the predictive performance of machine learning algorithms.1–4 Of the many techniques developed to address class imbalance,5,6 here, we investigated simple undersampling. This method is straightforward and accessible, but evidence on its performance is mixed and practical guidance is needed. Using simulated data, we assessed the predictive performance of the ensemble machine learning algorithm SuperLearner and logistic regression in imbalanced and undersampled data to investigate whether undersampling alters predictive accuracy.

Keywords: logistic regression; machine; predictive performance; machine learning; learning algorithms

Journal Title: Epidemiology
Year Published: 2020

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.