Data-splitting is the most widely used method to cross-validate global horizontal irradiation regression models. An available dataset is split into two subsets, one to calibrate models and the other to… Click to show full abstract
Data-splitting is the most widely used method to cross-validate global horizontal irradiation regression models. An available dataset is split into two subsets, one to calibrate models and the other to validate them. This study investigated the sufficiency of this method within the ambit of two other cross-validation techniques—Monte Carlo cross-validation nested with double cross-validation and leave-one-year-out cross-validation. These techniques facilitated cross-validation in long and short term periods, respectively. They were applied to the De Souza and Hargreaves-Samani temperature-based regression models. Unlike data-splitting, the techniques promoted full characterization of the models by the averages and sensitivities (%) of their tuned parameters, the averages and spread of their predictive accuracies via root mean square errors, and their stability (Monte Carlo-determined). On a monthly average daily time scale, their fully characterized (less their average tuned parameters) Monte Carlo results were <6%, 0.56 ± 0.12 and 0.032 MJ m−2 day−1 for the De Souza model, and <1.5%, 0.94 ± 0.14 and 0.174 MJ m−2 day−1 for the Hargreaves-Samani model. Similarly, the leave-one-year-out results were <2% and 0.88 ± 0.28 MJ m−2 day−1 for the De Souza model and <1% and 1.31 ± 0.24 MJ m−2 day−1 for the Hargreaves-Samani model. The De Souza model performed better. We further demonstrated the erroneous assessments possible with models subjected to traditional data-splitting which proved inadequate. Consequently, we proposed an algorithm to implement our cross-validation techniques that reduces computational burden for multiple model evaluation. This was achieved by including a novel controlled data-splitting cross-validation subroutine.Data-splitting is the most widely used method to cross-validate global horizontal irradiation regression models. An available dataset is split into two subsets, one to calibrate models and the other to validate them. This study investigated the sufficiency of this method within the ambit of two other cross-validation techniques—Monte Carlo cross-validation nested with double cross-validation and leave-one-year-out cross-validation. These techniques facilitated cross-validation in long and short term periods, respectively. They were applied to the De Souza and Hargreaves-Samani temperature-based regression models. Unlike data-splitting, the techniques promoted full characterization of the models by the averages and sensitivities (%) of their tuned parameters, the averages and spread of their predictive accuracies via root mean square errors, and their stability (Monte Carlo-determined). On a monthly average daily time scale, their fully characterized (less their average tuned parameters) Monte Carlo result...
               
Click one of the above tabs to view related content.