HausaNLP at SemEval-2023 Task 12: Leveraging African Low Resource TweetData for Sentiment Analysis
We present the findings of SemEval-2023 Task 12, a shared task on sentiment analysis for low-resource African languages using Twitter dataset. The task featured three subtasks; subtask A is monolingual sentiment classification with 12 tracks which are all monolingual languages, subtask B is multilin...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We present the findings of SemEval-2023 Task 12, a shared task on sentiment
analysis for low-resource African languages using Twitter dataset. The task
featured three subtasks; subtask A is monolingual sentiment classification with
12 tracks which are all monolingual languages, subtask B is multilingual
sentiment classification using the tracks in subtask A and subtask C is a
zero-shot sentiment classification. We present the results and findings of
subtask A, subtask B and subtask C. We also release the code on github. Our
goal is to leverage low-resource tweet data using pre-trained Afro-xlmr-large,
AfriBERTa-Large, Bert-base-arabic-camelbert-da-sentiment (Arabic-camelbert),
Multilingual-BERT (mBERT) and BERT models for sentiment analysis of 14 African
languages. The datasets for these subtasks consists of a gold standard
multi-class labeled Twitter datasets from these languages. Our results
demonstrate that Afro-xlmr-large model performed better compared to the other
models in most of the languages datasets. Similarly, Nigerian languages: Hausa,
Igbo, and Yoruba achieved better performance compared to other languages and
this can be attributed to the higher volume of data present in the languages. |
---|---|
DOI: | 10.48550/arxiv.2304.13634 |