Abstract Mapping and monitoring of indicators of soil cover, vegetation structure, and various native and non-native species is a critical aspect of rangeland management. With the advancement in satellite imagery… Click to show full abstract
Abstract Mapping and monitoring of indicators of soil cover, vegetation structure, and various native and non-native species is a critical aspect of rangeland management. With the advancement in satellite imagery as well as cloud storage and computing, the capability now exists to conduct planetary-scale analysis, including mapping of rangeland indicators. Combined with recent investments in the collection of large amounts of in situ data in the western U.S., new approaches using machine learning can enable prediction of surface conditions at times and places when no in situ data are available. However, little analysis has yet been done on how the temporal relevancy of training data influences model performance. Here, we have leveraged the Google Earth Engine (GEE) platform and a machine learning algorithm (Random Forest, after comparison with other candidates) to identify the potential impact of different sampling times (across months and years) on estimation of rangeland indicators from the Bureau of Land Management's (BLM) Assessment, Inventory, and Monitoring (AIM) and Landscape Monitoring Framework (LMF) programs. Our results indicate that temporally relevant training data improves predictions, though the training data need not be from the exact same month and year for a prediction to be temporally relevant. Moreover, inclusion of training data from the time when predictions are desired leads to lower prediction error but the addition of training data from other times does not contribute to overall model error. Using all of the available training data can lead to biases, toward the mean, for times when indicator values are especially high or low. However, for mapping purposes, limiting training data to just the time when predictions are desired can lead to poor predictions of values outside the spatial range of the training data for that period. We conclude that the best Random Forest prediction maps will use training data from all possible times with the understanding that estimates at the extremes will be biased.
               
Click one of the above tabs to view related content.