Effect of Random Splitting and Cross Validation for Indonesian Opinion Mining using Machine Learning Approach

Opinion mining has been a prominent topic of research in Indonesia, however there are still many unanswered questions. The majority of past research has been on machine learning methods and models. A comparison of the effects of random splitting and cross-validation on processing performance is requ...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of advanced computer science & applications 2022, Vol.13 (9)
Hauptverfasser:	Purba, Mariana, Ermatita, Ermatita, Abdiansah, Abdiansah, Noprisson, Handrie, Ayumi, Vina, Setiawan, Hadiguna, Salamah, Umniy, Yadi, Yadi
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Data collection Data mining Feature extraction Machine learning Sentiment analysis Splitting Support vector machines
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Opinion mining has been a prominent topic of research in Indonesia, however there are still many unanswered questions. The majority of past research has been on machine learning methods and models. A comparison of the effects of random splitting and cross-validation on processing performance is required. Text data is in Indonesian. The goal of this project is to use a machine learning model to conduct opinion mining on Indonesian text data using a random splitting and cross validation approach. This research consists of five stages: data collection, pre-processing, feature extraction, training & testing, and evaluation. Based on the experimental results, the TF-IDF feature is better than the Count-Vectorizer (CV) for Indonesian text. The best accuracy results are obtained by using TF-IDF as a feature and Support Vector Machine (SVM) as a classifier with cross validation implementation. The best accuracy reaches 81%. From the experimental results, it can also be seen that the implementation of cross validation can improve accuracy compared to the implementation of random splitting.
ISSN:	2158-107X 2156-5570
DOI:	10.14569/IJACSA.2022.0130917