Modeling techniques used in digital soil carbon mapping encompass a variety of algorithms to address spatial prediction problems such as spatial non-stationarity, nonlinearity and multi-colinearity. A given study site can… Click to show full abstract
Modeling techniques used in digital soil carbon mapping encompass a variety of algorithms to address spatial prediction problems such as spatial non-stationarity, nonlinearity and multi-colinearity. A given study site can inherit one or more such spatial prediction problems, necessitating the use of a combination of statistical learning algorithms to improve the accuracy of predictions. In addition, the training sample size may affect the accuracy of the model predictions. The effect of varying sample size on model accuracy has not been widely studied in pedometrics. To help fill this gap, we examined the behavior of multiple linear regression (MLR), geographically weighted regression (GWR), linear mixed models (LMMs), Cubist regression trees, quantile regression forests (QRFs), and extreme learning machine regression (ELMR) under varying sample sizes. The results showed that for the study site in the Hunter Valley, Australia, the accuracy of spatial prediction of soil carbon is more sensitive to training sample size compared to the model type used. The prediction accuracy initially increases exponentially with increasing sample size, eventually reaching a plateau. Different models reach their maximum predictive potential at different sample sizes. Furthermore, the uncertainty of model predictions decreases with increasing training sample sizes.
               
Click one of the above tabs to view related content.