The Impact of Oversampling and Undersampling on Aspect-Based Sentiment Analysis of Indramayu Tourism Using Logistic Regression

Aspect-based sentiment analysis aims to classify sentiment polarity in opinionated texts based on its associated aspect. However, imbalanced data is a significant challenge that can lead to a decline in classification performance. In machine learning, strategies such as oversampling and undersamplin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Revue d'Intelligence Artificielle 2024-06, Vol.38 (3), p.795-804
Hauptverfasser: Chamidah, Nurul, Widiyanto, Didit, Seta, Henki Bayu, Aziz, Azwa Abdul
Format: Artikel
Sprache:eng ; fre
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Aspect-based sentiment analysis aims to classify sentiment polarity in opinionated texts based on its associated aspect. However, imbalanced data is a significant challenge that can lead to a decline in classification performance. In machine learning, strategies such as oversampling and undersampling can be implemented to rectify this imbalance. The primary objective of this study is to investigate the impact of data balancing techniques, including oversampling and undersampling, on aspect-based sentiment analysis to enhance classification performance. To achieve this objective, SMOTE, random oversampling, and random undersampling are employed in logistic regression for multi-label classification in aspect-based sentiment analysis. The data for this study was obtained from Google Reviews submitted by individuals who visited the beach in Indramayu. Subsequently, this data was annotated based on tourism-related factors and the sentiments expressed by users. Following this, the data underwent a preprocessing stage and was divided into separate training and test datasets. The training dataset accounted for 60% of the data, while the remaining portion was allocated for testing purposes. During the model training process, data balancing was achieved by implementing oversampling and undersampling techniques and utilizing Logistic Regression with Stochastic Gradient Descent Optimization as the model learning method. The resultant model was subsequently employed to test the test dataset. The evaluation results indicate that oversampling techniques led to a considerable improvement in performance compared to the absence of data balancing. These findings provide a comparison between balancing techniques in sentiment analysis models in tourism that suffer from an imbalanced dataset. Consequently, the oversampling technique can be considered in developing aspect-based sentiment analysis models within the tourism industry.
ISSN:0992-499X
1958-5748
DOI:10.18280/ria.380306