ABSTRACT Many social scientists recognize that quantitative text analysis is a useful research methodology, but its application is still concentrated in documents written in European languages, especially English, and few… Click to show full abstract
ABSTRACT Many social scientists recognize that quantitative text analysis is a useful research methodology, but its application is still concentrated in documents written in European languages, especially English, and few sub-fields of political science, such as comparative politics and legislative studies. This seems to be due to the absence of flexible and cost-efficient methods that can be used to analyze documents in different domains and languages. Aiming to solve this problem, this paper proposes a semisupervised document scaling technique, called Latent Semantic Scaling (LSS), which can locate documents on various pre-defined dimensions. LSS achieves this by combining user-provided seed words and latent semantic analysis (word embedding). The article demonstrates its flexibility and efficiency in large-scale sentiment analysis of New York Times articles on the economy and Asahi Shimbun articles on politics. These examples show that LSS can produce results comparable to that of the Lexicoder Sentiment Dictionary (LSD) in both English and Japanese with only small sets of sentiment seed words. A new heuristic method that assists LSS users to choose a near-optimal number of singular values to obtain word vectors that best capture differences between documents on target dimensions is also presented.
               
Click one of the above tabs to view related content.