Hyperparameter Tuning of Semi-Supervised Learning for Indonesian Text Annotation

A crucial issue in sentiment analysis primarily relies on the annotation task involving data labeling. This critical step is typically performed by linguists, as the nuanced meaning of text significantly influences its contextual interpretation. If there is a large volume of data, annotation is time...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of advanced computer science & applications 2023, Vol.14 (9)
Hauptverfasser: Khomsah, Siti, Cahyana, Nur Heri, Aribowo, Agus Sasmito
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A crucial issue in sentiment analysis primarily relies on the annotation task involving data labeling. This critical step is typically performed by linguists, as the nuanced meaning of text significantly influences its contextual interpretation. If there is a large volume of data, annotation is time-consuming and financially burdensome. Addressing these challenges, a semi-supervised learning annotation (SSL) that integrates human annotator and artificial intelligence algorithms emerges as a potent solution. Building accurate SSL needs to explore the best architecture, including a combination of machine learning and mechanism. This research aims to construct semi-supervised model annotation text by tuning the parameter of the machine learning algorithm to gain the most accurate model. This study employed a Support Vector Machine and a Random Forest algorithm to build semi-supervised annotation. Grid-Search and Random-Search were employed to tune the Random Forest and Support Vector Machine parameters. The semi-supervised annotation model was applied to annotate Indonesian texts. The outcomes signify that hyperparameter-tuning enhances SSL performance, surpassing the performance achieved using default parameters. The experiment also shows that the SSL annotation using a Support Vector Machine tuned by Grid Search and Random Search is more robust than the Random Forest algorithm. Hyperparameter tuning is also robust to training data that contains many manual labeling errors by experts.
ISSN:2158-107X
2156-5570
DOI:10.14569/IJACSA.2023.0140927