LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Being Aware of Data Leakage and Cross‐Validation Scaling in Chemometric Model Validation

The main goal of our investigation is to raise awareness among chemometricians about how easy it is to introduce data or parameter leakage by inappropriate methods and to demonstrate that… Click to show full abstract

The main goal of our investigation is to raise awareness among chemometricians about how easy it is to introduce data or parameter leakage by inappropriate methods and to demonstrate that high precision is necessary in the interpretation of opinions found in the literature on the preference of leave‐one‐out, leave‐many‐out, and repeated cross‐validation methods. We show how the Kennard–Stone method and inappropriate use of repeated measurements cause data leakage in train/test splitting. We demonstrate how cross‐validation parameters became overoptimistic if they are used in hyperparameter selection of models or in variable selection. We call this effect parameter leakage. We extend the leave‐one‐out/leave‐many‐out scaling law on repeated cross‐validation. We discuss and justify in some model calculations that infinite sample size inconsistencies of leave‐one‐out cross‐validation with respect to leave‐many‐out one can be theoretically important, but it need not be relevant at practical data sizes in chemometrics.

Keywords: validation; data leakage; model; cross validation

Journal Title: Journal of Chemometrics
Year Published: 2025

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.