A Valence-Totaling Model for Vietnamese sentiment classification

Many researchers, applications and fields of study have researched and used many works concerning the sentiment classification. Each model (or method) of the sentiment analysis has many advantages and many disadvantages. Thus, we see that the opinion classification is an extremely important field of...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Evolving systems 2019-09, Vol.10 (3), p.453-499
Hauptverfasser: Phu, Vo Ngoc, Chau, Vo Thi Ngoc, Tran, Vo Thi Ngoc, Duy, Dat Nguyen, Duy, Khanh Ly Doan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Many researchers, applications and fields of study have researched and used many works concerning the sentiment classification. Each model (or method) of the sentiment analysis has many advantages and many disadvantages. Thus, we see that the opinion classification is an extremely important field of research. In this study, we have proposed a Valence-Totaling Model for Vietnamese (called VTMfV, a new model for Vietnamese sentiment classification) to classify many Vietnamese documents. First of all, we built a new Vietnamese sentiment dictionary which contains sentiment-bearing Vietnamese words such as negative Vietnamese words, positive Vietnamese words and neutral Vietnamese words. The Jaccard Measure (JM) is a similarity measure between two words (or two vectors); our Vietnamese sentiment dictionary has been created using JM. We call the Vietnamese sentiment dictionary “VSD_JM”. JM has been used in many researches of the English sentiment classification; however, it has not yet been used in any study of the Vietnamese sentient classification. From this moment, JM can be applied for the researches of the Vietnamese sentiment analysis. Then, our VTMfV has used our VSD_JM to classify the Vietnamese documents. We have processed all kinds of Vietnamese sentences. Finally, we have used the VTMfV to classify 30,000 Vietnamese documents which include the 15,000 positive Vietnamese documents and the 15,000 negative Vietnamese documents. We have achieved accuracy in 63.9% of our Vietnamese testing data set. VTMfV is not dependent on the special domain. VTMfV is also not dependent on the training data set and there is no training stage in this VTMfV. From our results in this work, our VTMfV can be applied in the different fields of the Vietnamese natural language processing. In addition, our TCMfV can be applied to many other languages such as Spanish, Korean, etc. It can also be applied to the big data set sentiment classification in Vietnamese and can classify millions of the Vietnamese documents.
ISSN:1868-6478
1868-6486
DOI:10.1007/s12530-017-9187-7