Abstract Land use regression model (LUR) is one of the most commonly used methods to project the spatial concentration of ambient pollutants. The number and location of samples are two… Click to show full abstract
Abstract Land use regression model (LUR) is one of the most commonly used methods to project the spatial concentration of ambient pollutants. The number and location of samples are two key factors affecting the accuracy of LUR, yet limited detail is known to us. In order to explore such effect, we collected NO2 monitoring data in high spatial density with a total of 263 sites in Shijiazhuang city of China, and designed four sampling strategies: random sampling, regular sampling, attribute hierarchical sampling, and purposive sampling. Under each strategy, LUR model was repeatedly built with increasing number of modeling site (NMS). Results showed that NMS and their locations affected model performance largely especially when NMS was less than 30. With the increase of NMS, the accuracy of LUR models gradually stabilized. The minimum NMS required for LUR would be 30, and the ideal number would be 60 for the study area. Purposive sampling was the most efficient strategies. R2 during modeling and cross validation was greatly inflated comparing to hold-out validation, which was more obvious with less NMS.
               
Click one of the above tabs to view related content.