MphayaNER: Named Entity Recognition for Tshivenda
Named Entity Recognition (NER) plays a vital role in various Natural Language Processing tasks such as information retrieval, text classification, and question answering. However, NER can be challenging, especially in low-resource languages with limited annotated datasets and tools. This paper adds...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Mbuvha, Rendani Adelani, David I Mutavhatsindi, Tendani Rakhuhu, Tshimangadzo Mauda, Aluwani Maumela, Tshifhiwa Joshua Masindi, Andisani Rananga, Seani Marivate, Vukosi Marwala, Tshilidzi |
description | Named Entity Recognition (NER) plays a vital role in various Natural Language
Processing tasks such as information retrieval, text classification, and
question answering. However, NER can be challenging, especially in low-resource
languages with limited annotated datasets and tools. This paper adds to the
effort of addressing these challenges by introducing MphayaNER, the first
Tshivenda NER corpus in the news domain. We establish NER baselines by
\textit{fine-tuning} state-of-the-art models on MphayaNER. The study also
explores zero-shot transfer between Tshivenda and other related Bantu
languages, with chiShona and Kiswahili showing the best results. Augmenting
MphayaNER with chiShona data was also found to improve model performance
significantly. Both MphayaNER and the baseline models are made publicly
available. |
doi_str_mv | 10.48550/arxiv.2304.03952 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2304_03952</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2304_03952</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-8116151876874d127f684cf60136e1bf73e867429715284803ed595c8ec077c43</originalsourceid><addsrcrecordid>eNotzr1uwjAUQGEvDBX0ATrVL5DU13_3hq1CaUECKqHskXFssAQJChFq3r4qMJ3t6GPsDUSuyRjx4frfdMulEjoXqjDyhcHmcnSj25a7Od-6c2h42Q5pGPku-O7QpiF1LY9dz6vrMd1C27gZm0R3uobXZ6es-iqrxTJb_3yvFp_rzFmUGQFYMEBoCXUDEqMl7aMVoGyAfUQVyKKWBYKRpEmo0JjCeApeIHqtpuz9sb2b60ufzq4f6397fberP47vO-Q</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>MphayaNER: Named Entity Recognition for Tshivenda</title><source>arXiv.org</source><creator>Mbuvha, Rendani ; Adelani, David I ; Mutavhatsindi, Tendani ; Rakhuhu, Tshimangadzo ; Mauda, Aluwani ; Maumela, Tshifhiwa Joshua ; Masindi, Andisani ; Rananga, Seani ; Marivate, Vukosi ; Marwala, Tshilidzi</creator><creatorcontrib>Mbuvha, Rendani ; Adelani, David I ; Mutavhatsindi, Tendani ; Rakhuhu, Tshimangadzo ; Mauda, Aluwani ; Maumela, Tshifhiwa Joshua ; Masindi, Andisani ; Rananga, Seani ; Marivate, Vukosi ; Marwala, Tshilidzi</creatorcontrib><description>Named Entity Recognition (NER) plays a vital role in various Natural Language
Processing tasks such as information retrieval, text classification, and
question answering. However, NER can be challenging, especially in low-resource
languages with limited annotated datasets and tools. This paper adds to the
effort of addressing these challenges by introducing MphayaNER, the first
Tshivenda NER corpus in the news domain. We establish NER baselines by
\textit{fine-tuning} state-of-the-art models on MphayaNER. The study also
explores zero-shot transfer between Tshivenda and other related Bantu
languages, with chiShona and Kiswahili showing the best results. Augmenting
MphayaNER with chiShona data was also found to improve model performance
significantly. Both MphayaNER and the baseline models are made publicly
available.</description><identifier>DOI: 10.48550/arxiv.2304.03952</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language</subject><creationdate>2023-04</creationdate><rights>http://creativecommons.org/licenses/by-nc-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2304.03952$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2304.03952$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Mbuvha, Rendani</creatorcontrib><creatorcontrib>Adelani, David I</creatorcontrib><creatorcontrib>Mutavhatsindi, Tendani</creatorcontrib><creatorcontrib>Rakhuhu, Tshimangadzo</creatorcontrib><creatorcontrib>Mauda, Aluwani</creatorcontrib><creatorcontrib>Maumela, Tshifhiwa Joshua</creatorcontrib><creatorcontrib>Masindi, Andisani</creatorcontrib><creatorcontrib>Rananga, Seani</creatorcontrib><creatorcontrib>Marivate, Vukosi</creatorcontrib><creatorcontrib>Marwala, Tshilidzi</creatorcontrib><title>MphayaNER: Named Entity Recognition for Tshivenda</title><description>Named Entity Recognition (NER) plays a vital role in various Natural Language
Processing tasks such as information retrieval, text classification, and
question answering. However, NER can be challenging, especially in low-resource
languages with limited annotated datasets and tools. This paper adds to the
effort of addressing these challenges by introducing MphayaNER, the first
Tshivenda NER corpus in the news domain. We establish NER baselines by
\textit{fine-tuning} state-of-the-art models on MphayaNER. The study also
explores zero-shot transfer between Tshivenda and other related Bantu
languages, with chiShona and Kiswahili showing the best results. Augmenting
MphayaNER with chiShona data was also found to improve model performance
significantly. Both MphayaNER and the baseline models are made publicly
available.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzr1uwjAUQGEvDBX0ATrVL5DU13_3hq1CaUECKqHskXFssAQJChFq3r4qMJ3t6GPsDUSuyRjx4frfdMulEjoXqjDyhcHmcnSj25a7Od-6c2h42Q5pGPku-O7QpiF1LY9dz6vrMd1C27gZm0R3uobXZ6es-iqrxTJb_3yvFp_rzFmUGQFYMEBoCXUDEqMl7aMVoGyAfUQVyKKWBYKRpEmo0JjCeApeIHqtpuz9sb2b60ufzq4f6397fberP47vO-Q</recordid><startdate>20230408</startdate><enddate>20230408</enddate><creator>Mbuvha, Rendani</creator><creator>Adelani, David I</creator><creator>Mutavhatsindi, Tendani</creator><creator>Rakhuhu, Tshimangadzo</creator><creator>Mauda, Aluwani</creator><creator>Maumela, Tshifhiwa Joshua</creator><creator>Masindi, Andisani</creator><creator>Rananga, Seani</creator><creator>Marivate, Vukosi</creator><creator>Marwala, Tshilidzi</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230408</creationdate><title>MphayaNER: Named Entity Recognition for Tshivenda</title><author>Mbuvha, Rendani ; Adelani, David I ; Mutavhatsindi, Tendani ; Rakhuhu, Tshimangadzo ; Mauda, Aluwani ; Maumela, Tshifhiwa Joshua ; Masindi, Andisani ; Rananga, Seani ; Marivate, Vukosi ; Marwala, Tshilidzi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-8116151876874d127f684cf60136e1bf73e867429715284803ed595c8ec077c43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Mbuvha, Rendani</creatorcontrib><creatorcontrib>Adelani, David I</creatorcontrib><creatorcontrib>Mutavhatsindi, Tendani</creatorcontrib><creatorcontrib>Rakhuhu, Tshimangadzo</creatorcontrib><creatorcontrib>Mauda, Aluwani</creatorcontrib><creatorcontrib>Maumela, Tshifhiwa Joshua</creatorcontrib><creatorcontrib>Masindi, Andisani</creatorcontrib><creatorcontrib>Rananga, Seani</creatorcontrib><creatorcontrib>Marivate, Vukosi</creatorcontrib><creatorcontrib>Marwala, Tshilidzi</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Mbuvha, Rendani</au><au>Adelani, David I</au><au>Mutavhatsindi, Tendani</au><au>Rakhuhu, Tshimangadzo</au><au>Mauda, Aluwani</au><au>Maumela, Tshifhiwa Joshua</au><au>Masindi, Andisani</au><au>Rananga, Seani</au><au>Marivate, Vukosi</au><au>Marwala, Tshilidzi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MphayaNER: Named Entity Recognition for Tshivenda</atitle><date>2023-04-08</date><risdate>2023</risdate><abstract>Named Entity Recognition (NER) plays a vital role in various Natural Language
Processing tasks such as information retrieval, text classification, and
question answering. However, NER can be challenging, especially in low-resource
languages with limited annotated datasets and tools. This paper adds to the
effort of addressing these challenges by introducing MphayaNER, the first
Tshivenda NER corpus in the news domain. We establish NER baselines by
\textit{fine-tuning} state-of-the-art models on MphayaNER. The study also
explores zero-shot transfer between Tshivenda and other related Bantu
languages, with chiShona and Kiswahili showing the best results. Augmenting
MphayaNER with chiShona data was also found to improve model performance
significantly. Both MphayaNER and the baseline models are made publicly
available.</abstract><doi>10.48550/arxiv.2304.03952</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2304.03952 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2304_03952 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Computation and Language |
title | MphayaNER: Named Entity Recognition for Tshivenda |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T19%3A07%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MphayaNER:%20Named%20Entity%20Recognition%20for%20Tshivenda&rft.au=Mbuvha,%20Rendani&rft.date=2023-04-08&rft_id=info:doi/10.48550/arxiv.2304.03952&rft_dat=%3Carxiv_GOX%3E2304_03952%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |