ABSTRACT How does sourcing affect which events are included in international relations datasets? The increasing number of machine-coded datasets offers the promise of coding a larger corpus of documents more… Click to show full abstract
ABSTRACT How does sourcing affect which events are included in international relations datasets? The increasing number of machine-coded datasets offers the promise of coding a larger corpus of documents more quickly, but existing automated processes rely exclusively on databases of news reports for coverage. We exploit source variation in the UCDP GED dataset, which includes events from media reports and non-media sources, to explore the bias introduced by including only media reports in international relations datasets. Unlike previous studies, our approach allows us to compare subnational and cross-national determinants of bias. We find that media sources severely underreport events in African countries, and coverage is also associated with country-level factors like international trade and subnational factors like access to communication technology. Non-media sources cover a significant number of events not included in media sources; their inclusion can expand coverage and reduce bias in datasets.
               
Click one of the above tabs to view related content.