Capture-recapture (CRC) is currently considered a promising method to integrate big data in official statistics. We previously applied CRC to estimate road freight transport with survey data (as the first… Click to show full abstract
Capture-recapture (CRC) is currently considered a promising method to integrate big data in official statistics. We previously applied CRC to estimate road freight transport with survey data (as the first capture) and road sensor data (as the second capture), using license plate and time-stamp to identify re-captured vehicles. A considerable difference was found between the single-source, design-based survey estimate, and the multiple-source, model-based CRC estimate. One possible explanation is underreporting in the survey, which is conceivable given the response burden of diary questionnaires. In this paper, we explore alternative explanations by quantifying their effect on the estimated amount of underreporting. In particular, we study the effects of 1) reporting errors, including a mismatch between the reported day of loading and the measured day of driving, 2) measurement errors, including false positives and OCR failure, 3) considering vehicles reported not owned as nonresponse error instead of frame error, and 4) response mode. We conclude that alternative hypotheses are unlikely to fully explain the difference between the survey estimate and the CRC estimate. Underreporting, therefore, remains a likely explanation, illustrating the power of combining survey and sensor data.
               
Click one of the above tabs to view related content.