SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages

Southeast Asia (SEA) is a region rich in linguistic diversity and cultural variety, with over 1,300 indigenous languages and a population of 671 million people. However, prevailing AI models suffer from a significant lack of representation of texts, images, and audio datasets from SEA, compromising...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-10
Hauptverfasser: Holy Lovenia, Rahmad Mahendra, Salsabil Maulana Akbar, Miranda, Lester James V, Santoso, Jennifer, Aco, Elyanah, Akhdan Fadhilah, Mansurov, Jonibek, Imperial, Joseph Marvin, Kampman, Onno P, Joel Ruben Antony Moniz, Muhammad Ravi Shulthan Habibi, Hudi, Frederikus, Railey Montalan, Ryan, Ignatius, Joanito Agili Lopo, Nixon, William, Karlsson, Börje F, Jaya, James, Diandaru, Ryandito, Gao, Yuze, Amadeus, Patrick, Wang, Bin, Blaise Cruz, Jan Christian, Whitehouse, Chenxi, Ivan Halim Parmonangan, Khelli, Maria, Zhang, Wenyu, Susanto, Lucky, Reynard Adha Ryanda, Hermawan, Sonny Lazuardi, Velasco, Dan John, Muhammad Dehan Al Kautsar, Hendria, Willy Fitra, Moslem, Yasmin, Flynn, Noah, Muhammad Farid Adilazuarda, Li, Haochen, Lee, Johanes, Damanhuri, R, Sun, Shuo, Qorib, Muhammad Reza, Djanibekov, Amirbek, Wei Qi Leong, Do, Quyet V, Muennighoff, Niklas, Pansuwan, Tanrada, Putra, Ilham Firdausi, Xu, Yan, Ngee Chia Tai, Purwarianti, Ayu, Ruder, Sebastian, Tjhi, William, Limkonchotiwat, Peerat, Aji, Alham Fikri, Keh, Sedrick, Genta Indra Winata, Zhang, Ruochen, Koto, Fajri, Zheng-Xin, Yong, Cahyawijaya, Samuel
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!