Construction of a training dataset for a sentiment analysis model of dairy products tweets in Brazil

Creating specific datasets for machine learning models is a frequent and challenging task, requiring considerable effort in sample collection and maintaining a balanced representation of each class. In this study, our objective was to create a training dataset for a sentiment analysis model by combi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Social Network Analysis and Mining 2024-04, Vol.14 (1), p.85, Article 85
Hauptverfasser: da Silva Nogueira, Thallys, Siqueira, Kennya Beatriz, Goliatt, Priscila Vanessa Zabala Capriles
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Creating specific datasets for machine learning models is a frequent and challenging task, requiring considerable effort in sample collection and maintaining a balanced representation of each class. In this study, our objective was to create a training dataset for a sentiment analysis model by combining results obtained from 5 natural language processing tools through 3 distinct approaches, aiming to automatically label various tweets in the negative, neutral, and positive classes. Additionally, we applied data balancing techniques to assess different methods' impacts on the sentiment analysis models' ability to generalize classes to previously unseen samples. The results demonstrated that the three approaches used to combine tool results and apply balancing techniques provided significantly superior outcomes compared to models with imbalanced datasets. These advancements enabled sentiment analysis models to achieve greater precision and generalization capacity for novel samples. These findings underscore the importance of considering effective data balancing strategies when creating training datasets for machine learning applications, especially in tasks sensitive to class imbalance, such as sentiment analysis. This enhanced approach is crucial to improving the performance and applicability of sentiment analysis models in real-world scenarios, providing more precise data analyses that unveil valuable insights in digital marketing.
ISSN:1869-5469
1869-5450
1869-5469
DOI:10.1007/s13278-024-01254-5