YAD: Leveraging T5 for Improved Automatic Diacritization of Yor\`ub\'a Text
In this work, we present Yor\`ub\'a automatic diacritization (YAD) benchmark dataset for evaluating Yor\`ub\'a diacritization systems. In addition, we pre-train text-to-text transformer, T5 model for Yor\`ub\'a and showed that this model outperform several multilingually trained T5 mo...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this work, we present Yor\`ub\'a automatic diacritization (YAD) benchmark
dataset for evaluating Yor\`ub\'a diacritization systems. In addition, we
pre-train text-to-text transformer, T5 model for Yor\`ub\'a and showed that
this model outperform several multilingually trained T5 models. Lastly, we
showed that more data and larger models are better at diacritization for
Yor\`ub\'a |
---|---|
DOI: | 10.48550/arxiv.2412.20218 |