Many researchers, applications and fields of study have researched and used many works concerning the sentiment classification. Each model (or method) of the sentiment analysis has many advantages and many… Click to show full abstract
Many researchers, applications and fields of study have researched and used many works concerning the sentiment classification. Each model (or method) of the sentiment analysis has many advantages and many disadvantages. Thus, we see that the opinion classification is an extremely important field of research. In this study, we have proposed a Valence-Totaling Model for Vietnamese (called VTMfV, a new model for Vietnamese sentiment classification) to classify many Vietnamese documents. First of all, we built a new Vietnamese sentiment dictionary which contains sentiment-bearing Vietnamese words such as negative Vietnamese words, positive Vietnamese words and neutral Vietnamese words. The Jaccard Measure (JM) is a similarity measure between two words (or two vectors); our Vietnamese sentiment dictionary has been created using JM. We call the Vietnamese sentiment dictionary “VSD_JM”. JM has been used in many researches of the English sentiment classification; however, it has not yet been used in any study of the Vietnamese sentient classification. From this moment, JM can be applied for the researches of the Vietnamese sentiment analysis. Then, our VTMfV has used our VSD_JM to classify the Vietnamese documents. We have processed all kinds of Vietnamese sentences. Finally, we have used the VTMfV to classify 30,000 Vietnamese documents which include the 15,000 positive Vietnamese documents and the 15,000 negative Vietnamese documents. We have achieved accuracy in 63.9% of our Vietnamese testing data set. VTMfV is not dependent on the special domain. VTMfV is also not dependent on the training data set and there is no training stage in this VTMfV. From our results in this work, our VTMfV can be applied in the different fields of the Vietnamese natural language processing. In addition, our TCMfV can be applied to many other languages such as Spanish, Korean, etc. It can also be applied to the big data set sentiment classification in Vietnamese and can classify millions of the Vietnamese documents.
               
Click one of the above tabs to view related content.