AfriWOZ: Corpus for Exploiting Cross-Lingual Transferability for Generation of Dialogues in Low-Resource, African Languages
Dialogue generation is an important NLP task fraught with many challenges. The challenges become more daunting for low-resource African languages. To enable the creation of dialogue agents for African languages, we contribute the first high-quality dialogue datasets for 6 African languages: Swahili,...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Dialogue generation is an important NLP task fraught with many challenges.
The challenges become more daunting for low-resource African languages. To
enable the creation of dialogue agents for African languages, we contribute the
first high-quality dialogue datasets for 6 African languages: Swahili, Wolof,
Hausa, Nigerian Pidgin English, Kinyarwanda & Yor\`ub\'a. These datasets
consist of 1,500 turns each, which we translate from a portion of the English
multi-domain MultiWOZ dataset. Subsequently, we investigate & analyze the
effectiveness of modelling through transfer learning by utilziing
state-of-the-art (SoTA) deep monolingual models: DialoGPT and BlenderBot. We
compare the models with a simple seq2seq baseline using perplexity. Besides
this, we conduct human evaluation of single-turn conversations by using
majority votes and measure inter-annotator agreement (IAA). We find that the
hypothesis that deep monolingual models learn some abstractions that generalize
across languages holds. We observe human-like conversations, to different
degrees, in 5 out of the 6 languages. The language with the most transferable
properties is the Nigerian Pidgin English, with a human-likeness score of
78.1%, of which 34.4% are unanimous. We freely provide the datasets and host
the model checkpoints/demos on the HuggingFace hub for public access. |
---|---|
DOI: | 10.48550/arxiv.2204.08083 |