Uhura: A Benchmark for Evaluating Scientific Question Answering and Truthfulness in Low-Resource African Languages
Evaluations of Large Language Models (LLMs) on knowledge-intensive tasks and factual accuracy often focus on high-resource languages primarily because datasets for low-resource languages (LRLs) are scarce. In this paper, we present Uhura -- a new benchmark that focuses on two tasks in six typologica...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Evaluations of Large Language Models (LLMs) on knowledge-intensive tasks and
factual accuracy often focus on high-resource languages primarily because
datasets for low-resource languages (LRLs) are scarce. In this paper, we
present Uhura -- a new benchmark that focuses on two tasks in six
typologically-diverse African languages, created via human translation of
existing English benchmarks. The first dataset, Uhura-ARC-Easy, is composed of
multiple-choice science questions. The second, Uhura-TruthfulQA, is a safety
benchmark testing the truthfulness of models on topics including health, law,
finance, and politics. We highlight the challenges creating benchmarks with
highly technical content for LRLs and outline mitigation strategies. Our
evaluation reveals a significant performance gap between proprietary models
such as GPT-4o and o1-preview, and Claude models, and open-source models like
Meta's LLaMA and Google's Gemma. Additionally, all models perform better in
English than in African languages. These results indicate that LMs struggle
with answering scientific questions and are more prone to generating false
claims in low-resource African languages. Our findings underscore the necessity
for continuous improvement of multilingual LM capabilities in LRL settings to
ensure safe and reliable use in real-world contexts. We open-source the Uhura
Benchmark and Uhura Platform to foster further research and development in NLP
for LRLs. |
---|---|
DOI: | 10.48550/arxiv.2412.00948 |