A Hybrid Ensemble Approach for Greek Text Classification Based on Multilingual Models

The present study explores the field of text classification in the Greek language. A novel ensemble classification scheme based on generated embeddings from Greek text made by the multilingual capabilities of the E5 model is presented. Our approach incorporates partial transfer learning by using pre...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Big data and cognitive computing 2024-10, Vol.8 (10), p.137
Hauptverfasser: Liapis, Charalampos M., Kyritsis, Konstantinos, Perikos, Isidoros, Spatiotis, Nikolaos, Paraskevas, Michael
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The present study explores the field of text classification in the Greek language. A novel ensemble classification scheme based on generated embeddings from Greek text made by the multilingual capabilities of the E5 model is presented. Our approach incorporates partial transfer learning by using pre-trained models to extract embeddings, enabling the evaluation of classical classifiers on Greek data. Additionally, we enhance the predictive capability while maintaining the costs low by employing a soft voting combination scheme that exploits the strengths of XGBoost, K-nearest neighbors, and logistic regression. This method significantly improves all classification metrics, demonstrating the superiority of ensemble techniques in handling the complexity of Greek textual data. Our study contributes to the field of natural language processing by proposing an effective ensemble framework for the categorization of Greek texts, leveraging the advantages of both traditional and modern machine learning techniques. This framework has the potential to be applied to other less-resourced languages, thereby broadening the impact of our research beyond Greek language processing.
ISSN:2504-2289
2504-2289
DOI:10.3390/bdcc8100137