AfriWOZ: Corpus for Exploiting Cross-Lingual Transferability for Generation of Dialogues in Low-Resource, African Languages

Dialogue generation is an important NLP task fraught with many challenges. The challenges become more daunting for low-resource African languages. To enable the creation of dialogue agents for African languages, we contribute the first high-quality dialogue datasets for 6 African languages: Swahili,...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Adewumi, Tosin, Adeyemi, Mofetoluwa, Anuoluwapo, Aremu, Peters, Bukola, Buzaaba, Happy, Samuel, Oyerinde, Rufai, Amina Mardiyyah, Ajibade, Benjamin, Gwadabe, Tajudeen, Traore, Mory Moussou Koulibaly, Ajayi, Tunde, Muhammad, Shamsuddeen, Baruwa, Ahmed, Owoicho, Paul, Ogunremi, Tolulope, Ngigi, Phylis, Ahia, Orevaoghene, Nasir, Ruqayya, Liwicki, Foteini, Liwicki, Marcus
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Adewumi, Tosin Adeyemi, Mofetoluwa Anuoluwapo, Aremu Peters, Bukola Buzaaba, Happy Samuel, Oyerinde Rufai, Amina Mardiyyah Ajibade, Benjamin Gwadabe, Tajudeen Traore, Mory Moussou Koulibaly Ajayi, Tunde Muhammad, Shamsuddeen Baruwa, Ahmed Owoicho, Paul Ogunremi, Tolulope Ngigi, Phylis Ahia, Orevaoghene Nasir, Ruqayya Liwicki, Foteini Liwicki, Marcus
description	Dialogue generation is an important NLP task fraught with many challenges. The challenges become more daunting for low-resource African languages. To enable the creation of dialogue agents for African languages, we contribute the first high-quality dialogue datasets for 6 African languages: Swahili, Wolof, Hausa, Nigerian Pidgin English, Kinyarwanda & Yor\`ub\'a. These datasets consist of 1,500 turns each, which we translate from a portion of the English multi-domain MultiWOZ dataset. Subsequently, we investigate & analyze the effectiveness of modelling through transfer learning by utilziing state-of-the-art (SoTA) deep monolingual models: DialoGPT and BlenderBot. We compare the models with a simple seq2seq baseline using perplexity. Besides this, we conduct human evaluation of single-turn conversations by using majority votes and measure inter-annotator agreement (IAA). We find that the hypothesis that deep monolingual models learn some abstractions that generalize across languages holds. We observe human-like conversations, to different degrees, in 5 out of the 6 languages. The language with the most transferable properties is the Nigerian Pidgin English, with a human-likeness score of 78.1%, of which 34.4% are unanimous. We freely provide the datasets and host the model checkpoints/demos on the HuggingFace hub for public access.
doi_str_mv	10.48550/arxiv.2204.08083
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2204_08083</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2204_08083</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-e7f72de8aadf094309098064bd9746fd7fe7a175c3690e299ba1fbc64d1607ae3</originalsourceid><addsrcrecordid>eNotUMFqhDAUzKWHsu0H9NR8QLXRRGN6W-x2uyAsFKHQizz1RQLWSKLtLv35rranGYZhhhlC7iIWiixJ2CO4k_kK45iJkGUs49fkZ6udeT9-PNHcunH2VFtHd6ext2YyQ0dzZ70Pigudoaelg8FrdFCb3kzn1bzH4SJMxg7UavpsoLfdjJ6agRb2O3hDb2fX4ANdmhq4qLCEdehvyJWG3uPtP25I-bIr89egOO4P-bYIIJU8QKll3GIG0GqmBGeKqYylom6VFKlupUYJkUwaniqGsVI1RLpuUtFGKZOAfEPu_2LX9dXozCe4c7W8UK0v8F8fHFlz</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>AfriWOZ: Corpus for Exploiting Cross-Lingual Transferability for Generation of Dialogues in Low-Resource, African Languages</title><source>arXiv.org</source><creator>Adewumi, Tosin ; Adeyemi, Mofetoluwa ; Anuoluwapo, Aremu ; Peters, Bukola ; Buzaaba, Happy ; Samuel, Oyerinde ; Rufai, Amina Mardiyyah ; Ajibade, Benjamin ; Gwadabe, Tajudeen ; Traore, Mory Moussou Koulibaly ; Ajayi, Tunde ; Muhammad, Shamsuddeen ; Baruwa, Ahmed ; Owoicho, Paul ; Ogunremi, Tolulope ; Ngigi, Phylis ; Ahia, Orevaoghene ; Nasir, Ruqayya ; Liwicki, Foteini ; Liwicki, Marcus</creator><creatorcontrib>Adewumi, Tosin ; Adeyemi, Mofetoluwa ; Anuoluwapo, Aremu ; Peters, Bukola ; Buzaaba, Happy ; Samuel, Oyerinde ; Rufai, Amina Mardiyyah ; Ajibade, Benjamin ; Gwadabe, Tajudeen ; Traore, Mory Moussou Koulibaly ; Ajayi, Tunde ; Muhammad, Shamsuddeen ; Baruwa, Ahmed ; Owoicho, Paul ; Ogunremi, Tolulope ; Ngigi, Phylis ; Ahia, Orevaoghene ; Nasir, Ruqayya ; Liwicki, Foteini ; Liwicki, Marcus</creatorcontrib><description>Dialogue generation is an important NLP task fraught with many challenges. The challenges become more daunting for low-resource African languages. To enable the creation of dialogue agents for African languages, we contribute the first high-quality dialogue datasets for 6 African languages: Swahili, Wolof, Hausa, Nigerian Pidgin English, Kinyarwanda & Yor\`ub\'a. These datasets consist of 1,500 turns each, which we translate from a portion of the English multi-domain MultiWOZ dataset. Subsequently, we investigate & analyze the effectiveness of modelling through transfer learning by utilziing state-of-the-art (SoTA) deep monolingual models: DialoGPT and BlenderBot. We compare the models with a simple seq2seq baseline using perplexity. Besides this, we conduct human evaluation of single-turn conversations by using majority votes and measure inter-annotator agreement (IAA). We find that the hypothesis that deep monolingual models learn some abstractions that generalize across languages holds. We observe human-like conversations, to different degrees, in 5 out of the 6 languages. The language with the most transferable properties is the Nigerian Pidgin English, with a human-likeness score of 78.1%, of which 34.4% are unanimous. We freely provide the datasets and host the model checkpoints/demos on the HuggingFace hub for public access.</description><identifier>DOI: 10.48550/arxiv.2204.08083</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2022-04</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2204.08083$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2204.08083$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Adewumi, Tosin</creatorcontrib><creatorcontrib>Adeyemi, Mofetoluwa</creatorcontrib><creatorcontrib>Anuoluwapo, Aremu</creatorcontrib><creatorcontrib>Peters, Bukola</creatorcontrib><creatorcontrib>Buzaaba, Happy</creatorcontrib><creatorcontrib>Samuel, Oyerinde</creatorcontrib><creatorcontrib>Rufai, Amina Mardiyyah</creatorcontrib><creatorcontrib>Ajibade, Benjamin</creatorcontrib><creatorcontrib>Gwadabe, Tajudeen</creatorcontrib><creatorcontrib>Traore, Mory Moussou Koulibaly</creatorcontrib><creatorcontrib>Ajayi, Tunde</creatorcontrib><creatorcontrib>Muhammad, Shamsuddeen</creatorcontrib><creatorcontrib>Baruwa, Ahmed</creatorcontrib><creatorcontrib>Owoicho, Paul</creatorcontrib><creatorcontrib>Ogunremi, Tolulope</creatorcontrib><creatorcontrib>Ngigi, Phylis</creatorcontrib><creatorcontrib>Ahia, Orevaoghene</creatorcontrib><creatorcontrib>Nasir, Ruqayya</creatorcontrib><creatorcontrib>Liwicki, Foteini</creatorcontrib><creatorcontrib>Liwicki, Marcus</creatorcontrib><title>AfriWOZ: Corpus for Exploiting Cross-Lingual Transferability for Generation of Dialogues in Low-Resource, African Languages</title><description>Dialogue generation is an important NLP task fraught with many challenges. The challenges become more daunting for low-resource African languages. To enable the creation of dialogue agents for African languages, we contribute the first high-quality dialogue datasets for 6 African languages: Swahili, Wolof, Hausa, Nigerian Pidgin English, Kinyarwanda & Yor\`ub\'a. These datasets consist of 1,500 turns each, which we translate from a portion of the English multi-domain MultiWOZ dataset. Subsequently, we investigate & analyze the effectiveness of modelling through transfer learning by utilziing state-of-the-art (SoTA) deep monolingual models: DialoGPT and BlenderBot. We compare the models with a simple seq2seq baseline using perplexity. Besides this, we conduct human evaluation of single-turn conversations by using majority votes and measure inter-annotator agreement (IAA). We find that the hypothesis that deep monolingual models learn some abstractions that generalize across languages holds. We observe human-like conversations, to different degrees, in 5 out of the 6 languages. The language with the most transferable properties is the Nigerian Pidgin English, with a human-likeness score of 78.1%, of which 34.4% are unanimous. We freely provide the datasets and host the model checkpoints/demos on the HuggingFace hub for public access.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotUMFqhDAUzKWHsu0H9NR8QLXRRGN6W-x2uyAsFKHQizz1RQLWSKLtLv35rranGYZhhhlC7iIWiixJ2CO4k_kK45iJkGUs49fkZ6udeT9-PNHcunH2VFtHd6ext2YyQ0dzZ70Pigudoaelg8FrdFCb3kzn1bzH4SJMxg7UavpsoLfdjJ6agRb2O3hDb2fX4ANdmhq4qLCEdehvyJWG3uPtP25I-bIr89egOO4P-bYIIJU8QKll3GIG0GqmBGeKqYylom6VFKlupUYJkUwaniqGsVI1RLpuUtFGKZOAfEPu_2LX9dXozCe4c7W8UK0v8F8fHFlz</recordid><startdate>20220417</startdate><enddate>20220417</enddate><creator>Adewumi, Tosin</creator><creator>Adeyemi, Mofetoluwa</creator><creator>Anuoluwapo, Aremu</creator><creator>Peters, Bukola</creator><creator>Buzaaba, Happy</creator><creator>Samuel, Oyerinde</creator><creator>Rufai, Amina Mardiyyah</creator><creator>Ajibade, Benjamin</creator><creator>Gwadabe, Tajudeen</creator><creator>Traore, Mory Moussou Koulibaly</creator><creator>Ajayi, Tunde</creator><creator>Muhammad, Shamsuddeen</creator><creator>Baruwa, Ahmed</creator><creator>Owoicho, Paul</creator><creator>Ogunremi, Tolulope</creator><creator>Ngigi, Phylis</creator><creator>Ahia, Orevaoghene</creator><creator>Nasir, Ruqayya</creator><creator>Liwicki, Foteini</creator><creator>Liwicki, Marcus</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20220417</creationdate><title>AfriWOZ: Corpus for Exploiting Cross-Lingual Transferability for Generation of Dialogues in Low-Resource, African Languages</title><author>Adewumi, Tosin ; Adeyemi, Mofetoluwa ; Anuoluwapo, Aremu ; Peters, Bukola ; Buzaaba, Happy ; Samuel, Oyerinde ; Rufai, Amina Mardiyyah ; Ajibade, Benjamin ; Gwadabe, Tajudeen ; Traore, Mory Moussou Koulibaly ; Ajayi, Tunde ; Muhammad, Shamsuddeen ; Baruwa, Ahmed ; Owoicho, Paul ; Ogunremi, Tolulope ; Ngigi, Phylis ; Ahia, Orevaoghene ; Nasir, Ruqayya ; Liwicki, Foteini ; Liwicki, Marcus</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-e7f72de8aadf094309098064bd9746fd7fe7a175c3690e299ba1fbc64d1607ae3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Adewumi, Tosin</creatorcontrib><creatorcontrib>Adeyemi, Mofetoluwa</creatorcontrib><creatorcontrib>Anuoluwapo, Aremu</creatorcontrib><creatorcontrib>Peters, Bukola</creatorcontrib><creatorcontrib>Buzaaba, Happy</creatorcontrib><creatorcontrib>Samuel, Oyerinde</creatorcontrib><creatorcontrib>Rufai, Amina Mardiyyah</creatorcontrib><creatorcontrib>Ajibade, Benjamin</creatorcontrib><creatorcontrib>Gwadabe, Tajudeen</creatorcontrib><creatorcontrib>Traore, Mory Moussou Koulibaly</creatorcontrib><creatorcontrib>Ajayi, Tunde</creatorcontrib><creatorcontrib>Muhammad, Shamsuddeen</creatorcontrib><creatorcontrib>Baruwa, Ahmed</creatorcontrib><creatorcontrib>Owoicho, Paul</creatorcontrib><creatorcontrib>Ogunremi, Tolulope</creatorcontrib><creatorcontrib>Ngigi, Phylis</creatorcontrib><creatorcontrib>Ahia, Orevaoghene</creatorcontrib><creatorcontrib>Nasir, Ruqayya</creatorcontrib><creatorcontrib>Liwicki, Foteini</creatorcontrib><creatorcontrib>Liwicki, Marcus</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Adewumi, Tosin</au><au>Adeyemi, Mofetoluwa</au><au>Anuoluwapo, Aremu</au><au>Peters, Bukola</au><au>Buzaaba, Happy</au><au>Samuel, Oyerinde</au><au>Rufai, Amina Mardiyyah</au><au>Ajibade, Benjamin</au><au>Gwadabe, Tajudeen</au><au>Traore, Mory Moussou Koulibaly</au><au>Ajayi, Tunde</au><au>Muhammad, Shamsuddeen</au><au>Baruwa, Ahmed</au><au>Owoicho, Paul</au><au>Ogunremi, Tolulope</au><au>Ngigi, Phylis</au><au>Ahia, Orevaoghene</au><au>Nasir, Ruqayya</au><au>Liwicki, Foteini</au><au>Liwicki, Marcus</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>AfriWOZ: Corpus for Exploiting Cross-Lingual Transferability for Generation of Dialogues in Low-Resource, African Languages</atitle><date>2022-04-17</date><risdate>2022</risdate><abstract>Dialogue generation is an important NLP task fraught with many challenges. The challenges become more daunting for low-resource African languages. To enable the creation of dialogue agents for African languages, we contribute the first high-quality dialogue datasets for 6 African languages: Swahili, Wolof, Hausa, Nigerian Pidgin English, Kinyarwanda & Yor\`ub\'a. These datasets consist of 1,500 turns each, which we translate from a portion of the English multi-domain MultiWOZ dataset. Subsequently, we investigate & analyze the effectiveness of modelling through transfer learning by utilziing state-of-the-art (SoTA) deep monolingual models: DialoGPT and BlenderBot. We compare the models with a simple seq2seq baseline using perplexity. Besides this, we conduct human evaluation of single-turn conversations by using majority votes and measure inter-annotator agreement (IAA). We find that the hypothesis that deep monolingual models learn some abstractions that generalize across languages holds. We observe human-like conversations, to different degrees, in 5 out of the 6 languages. The language with the most transferable properties is the Nigerian Pidgin English, with a human-likeness score of 78.1%, of which 34.4% are unanimous. We freely provide the datasets and host the model checkpoints/demos on the HuggingFace hub for public access.</abstract><doi>10.48550/arxiv.2204.08083</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2204.08083
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2204_08083
source	arXiv.org
subjects	Computer Science - Computation and Language
title	AfriWOZ: Corpus for Exploiting Cross-Lingual Transferability for Generation of Dialogues in Low-Resource, African Languages
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T12%3A56%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=AfriWOZ:%20Corpus%20for%20Exploiting%20Cross-Lingual%20Transferability%20for%20Generation%20of%20Dialogues%20in%20Low-Resource,%20African%20Languages&rft.au=Adewumi,%20Tosin&rft.date=2022-04-17&rft_id=info:doi/10.48550/arxiv.2204.08083&rft_dat=%3Carxiv_GOX%3E2204_08083%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true