Combination of Lexical Resources and Support Vector Machine for Film Sentiment Analysis

Text data generated by internet users holds potentially valuable information that can be researched for new insights. One strategy for obtaining information from a text data set is to classify text into predetermined categories based on existing data. Text classification is an aspect of Text Mining....

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	sinkron 2024-07, Vol.8 (3), p.1526-1538
Hauptverfasser:	Agustina, Putri, Putri, Raissa Amanda
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Text data generated by internet users holds potentially valuable information that can be researched for new insights. One strategy for obtaining information from a text data set is to classify text into predetermined categories based on existing data. Text classification is an aspect of Text Mining. One of the popular approaches in Text Mining uses the Support Vector Machine (SVM) classification algorithm, which aims to classify text and separate data into different classes. However, in some cases, SVM classification algorithms may face difficulties in understanding the context of the text properly due to unclear wording, varying sentence structures, or a lack of understanding of interpretation. To address this problem, applying SVM classification using lexical resources can be an effective solution. In this research framework, the first step is to obtain data, which in this case is a film review dataset taken from the kaggle.com site. After obtaining the data, the next step is preprocessing. The results of the preprocessing are then divided into 80:20 percentages. The 80% training data is used to search for the form of polarization, and this training data lexicon is used for training the SVM model. Based on the modeling results, the overall model accuracy is around 85%, calculated using the confusion matrix. The precision value, which shows the proportion of correct positive predictions, reached 88%. The precision for negative predictions reached 80%, and for neutral predictions, it reached 0%. These results show that the Lexicon+SVM model has good performance, with an accuracy of 85%.
ISSN:	2541-044X 2541-2019
DOI:	10.33395/sinkron.v8i3.13733