LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

CV-α: designing validations sets to increase the precision and enable multiple comparison tests in genomic prediction

Photo from wikipedia

Usually, the comparison among genomic prediction models is based on validation schemes as Repeated Random Subsampling (RRS) or K-fold cross-validation. Nevertheless, the design of training and validation sets has a… Click to show full abstract

Usually, the comparison among genomic prediction models is based on validation schemes as Repeated Random Subsampling (RRS) or K-fold cross-validation. Nevertheless, the design of training and validation sets has a high effect on the way and subjectiveness we compare models. Those procedures cited above have an overlap across replicates that might cause an overestimated estimate and lack of residuals independence due to resampling issues and might cause less accurate results. Furthermore, ANOVA and multiple-comparison tests, such as Tukey, are not recommended due to assumptions unfulfilled regarding residuals' independence. Thus, we propose a new way to sample observations to build training and validation sets based on cross-validation alpha-based design (CV-α). The CV-α was meant to create several validation scenarios (replicates x folds), regardless of the number of genotypes. Using CV-α, the number of genotypes in the same fold across replicates was much lower than K-fold cross-validation, indicating higher residual independence. Therefore, based on the CV-α results, as proof of concept, via ANOVA, we could compare the proposed methodology to RRS and K-fold cross-validation, applying four genomic prediction models with a simulated and real dataset. Concerning the predictive ability and bias, all validation methods showed similar performance. However, regarding the mean squared error and coefficient of variation, the CV-α method presented the best performance under the evaluated scenarios. Moreover, as it has no additional cost or complexity, it is more reliable and allows non-subjective methods to compare models and factors. Therefore, CV-α can be considered a more precise validation methodology for model selection.

Keywords: comparison; genomic prediction; methodology; validation; cross validation

Journal Title: Euphytica
Year Published: 2021

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.