"Two cross-validation techniques to comprehensively characterize global horizontal irradiation regression models: Single data-splitting is insufficient"

Data-splitting is the most widely used method to cross-validate global horizontal irradiation regression models. An available dataset is split into two subsets, one to calibrate models and the other to validate them. This study investigated the sufficiency of this method within the ambit of two other cross-validation techniques—Monte Carlo cross-validation nested with double cross-validation and leave-one-year-out cross-validation. These techniques facilitated cross-validation in long and short term periods, respectively. They were applied to the De Souza and Hargreaves-Samani temperature-based regression models. Unlike data-splitting, the techniques promoted full characterization of the models by the averages and sensitivities (%) of their tuned parameters, the averages and spread of their predictive accuracies via root mean square errors, and their stability (Monte Carlo-determined). On a monthly average daily time scale, their fully characterized (less their average tuned parameters) Monte Carlo results were <6%, 0.56 ± 0.12 and 0.032 MJ m−2 day−1 for the De Souza model, and <1.5%, 0.94 ± 0.14 and 0.174 MJ m−2 day−1 for the Hargreaves-Samani model. Similarly, the leave-one-year-out results were <2% and 0.88 ± 0.28 MJ m−2 day−1 for the De Souza model and <1% and 1.31 ± 0.24 MJ m−2 day−1 for the Hargreaves-Samani model. The De Souza model performed better. We further demonstrated the erroneous assessments possible with models subjected to traditional data-splitting which proved inadequate. Consequently, we proposed an algorithm to implement our cross-validation techniques that reduces computational burden for multiple model evaluation. This was achieved by including a novel controlled data-splitting cross-validation subroutine.Data-splitting is the most widely used method to cross-validate global horizontal irradiation regression models. An available dataset is split into two subsets, one to calibrate models and the other to validate them. This study investigated the sufficiency of this method within the ambit of two other cross-validation techniques—Monte Carlo cross-validation nested with double cross-validation and leave-one-year-out cross-validation. These techniques facilitated cross-validation in long and short term periods, respectively. They were applied to the De Souza and Hargreaves-Samani temperature-based regression models. Unlike data-splitting, the techniques promoted full characterization of the models by the averages and sensitivities (%) of their tuned parameters, the averages and spread of their predictive accuracies via root mean square errors, and their stability (Monte Carlo-determined). On a monthly average daily time scale, their fully characterized (less their average tuned parameters) Monte Carlo result...

Keywords: validation; cross validation; data splitting; regression models; validation techniques

Journal Title: Journal of Renewable and Sustainable Energy
Year Published: 2019

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
0

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended