AfriMTE and AfriCOMET: Enhancing COMET to Embrace Under-resourced African Languages

Despite the recent progress on scaling multilingual machine translation (MT) to several under-resourced African languages, accurately measuring this progress remains challenging, since evaluation is often performed on n-gram matching metrics such as BLEU, which typically show a weaker correlation wi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Wang, Jiayi, Adelani, David, Agrawal, Sweta, Masiak, Marek, Rei, Ricardo, Briakou, Eleftheria, Carpuat, Marine, He, Xuanli, Bourhim, Sofia, Bukula, Andiswa, Mohamed, Muhidin, Olatoye, Temitayo, Adewumi, Tosin, Mokayed, Hamam, Mwase, Christine, Kimotho, Wangui, Yuehgoh, Foutse, Aremu, Anuoluwapo, Ojo, Jessica, Muhammad, Shamsuddeen, Osei, Salomey, Omotayo, Abdul-Hakeem, Chukwuneke, Chiamaka, Ogayo, Perez, Hourrane, Oumaima, El Anigri, Salma, Ndolela, Lolwethu, Mangwana, Thabiso, Mohamed, Shafie, Ayinde, Hassan, Awoyomi, Oluwabusayo, Alkhaled, Lama, Al-Azzawi, Sana, Etori, Naome, Ochieng, Millicent, Siro, Clemencia, Kiragu, Njoroge, Muchiri, Eric, Kimotho, Wangari, Sakayo, Toadoum Sari, Wamba, Lyse Naomi, Abolade, Daud, Ajao, Simbiat, Shode, Iyanuoluwa, Macharm, Ricky, Iro, Ruqayya, Abdullahi, Saheed, Moore, Stephen, Opoku, Bernard, Akinjobi, Zainab, Afolabi, Abeeb, Obiefuna, Nnaemeka, Ogbu, Onyekachi, Ochieng’, Sam, Otiende, Verrah, Mbonu, Chinedu, Lu, Yao, Stenetorp, Pontus
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Despite the recent progress on scaling multilingual machine translation (MT) to several under-resourced African languages, accurately measuring this progress remains challenging, since evaluation is often performed on n-gram matching metrics such as BLEU, which typically show a weaker correlation with human judgments. Learned metrics such as COMET have higher correlation; however, the lack of evaluation data with human ratings for under-resourced languages, complexity of annotation guidelines like Multidimensional Quality Metrics (MQM), and limited language coverage of multilingual encoders have hampered their applicability to African languages. In this paper, we address these challenges by creating high-quality human evaluation data with simplified MQM guidelines for error detection and direct assessment (DA) scoring for 13 typologically diverse African languages. Furthermore, we develop AFRICOMET: COMET evaluation metrics for African languages by leveraging DA data from well-resourced languages and an African-centric multilingual encoder (AfroXLM-R) to create the state-of-the-art MT evaluation metrics for African languages with respect to Spearman-rank correlation with human judgments (0.441). © 2024 Association for Computational Linguistics English-Egyptian Arabic (eng-arz), English-French (eng-fra)—a control LP, English-Hausa (eng-hau), English-Igbo (eng-ibo), English-Kikuyu (eng-kik), English-Luo (eng-luo), English-Somali (eng-som), English-Swahili (eng-swh), English-Twi (eng-twi), English-isiXhosa (eng-xho), English-Yoruba (eng-yor), and Yoruba-English (yor-eng). Moreover, we extend our annotation collection to include domain-specific texts from News, TED talks, Movies, and IT domains for English-Yoruba translations, which were established in prior research by Adelani et al. (2021) and Shode et al. (2022), ensuring a comprehensive and domain-varied evaluation. We provide the information of language family groups that our targeted African languages belong to in Table 4 of Appendix A.1.
DOI:10.18653/v1/2024.naacl-long.334