The prevalence of online user-generated content has attracted great interest in textual sentiment analysis, which provides a low-cost yet effective way to discern consumers and markets. A mainstream of sentiment… Click to show full abstract
The prevalence of online user-generated content has attracted great interest in textual sentiment analysis, which provides a low-cost yet effective way to discern consumers and markets. A mainstream of sentiment analysis is to construct a classification model with Bag-of-Words (BoW) features, but the large vocabulary base and skewed distribution of term frequency consistently pose research challenges, which is made even worse by the limited valid sentiment labels. In light of this, in this paper, we propose a novel method called Structural Holes based Sentiment Classifier (SHSC) for BoW-based sentiment classification. The key to SHSC is to reinforce the classification contribution of semantically rich words with clear-cut sentiment polarity. To this end, a word co-occurrence network is carefully constructed to represent both high and low frequency words. The work to find classification-inefficient words is then transformed into the identification of so-called bridge nodes that occupy the positions of structural holes in the network. Two interesting measures, i.e., information advantage rank and control advantage weight, are then designed elaborately for this purpose, which are based on the proposed sentiment-label propagation and short-path computation algorithms, respectively. SHSC finally feeds this information as the key regularizers into a simple regression model to guide parametric learning. Extensive experiments on real-world text datasets demonstrate the advantage of our SHSC model over competitive benchmarks, particularly when sentiment labels are scarce. The effectiveness of uncovering structural holes for sentiment classification is also carefully verified with some robustness checks and demonstration cases.
               
Click one of the above tabs to view related content.