MphayaNER: Named Entity Recognition for Tshivenda
Named Entity Recognition (NER) plays a vital role in various Natural Language Processing tasks such as information retrieval, text classification, and question answering. However, NER can be challenging, especially in low-resource languages with limited annotated datasets and tools. This paper adds...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Named Entity Recognition (NER) plays a vital role in various Natural Language
Processing tasks such as information retrieval, text classification, and
question answering. However, NER can be challenging, especially in low-resource
languages with limited annotated datasets and tools. This paper adds to the
effort of addressing these challenges by introducing MphayaNER, the first
Tshivenda NER corpus in the news domain. We establish NER baselines by
\textit{fine-tuning} state-of-the-art models on MphayaNER. The study also
explores zero-shot transfer between Tshivenda and other related Bantu
languages, with chiShona and Kiswahili showing the best results. Augmenting
MphayaNER with chiShona data was also found to improve model performance
significantly. Both MphayaNER and the baseline models are made publicly
available. |
---|---|
DOI: | 10.48550/arxiv.2304.03952 |