CRAG -- Comprehensive RAG Benchmark

Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-11
Hauptverfasser:	Yang, Xiao, Sun, Kai, Hao Xin, Sun, Yushi, Bhalla, Nikita, Chen, Xiangsen, Choudhary, Sajal, Rongze Daniel Gui, Ziran Will Jiang, Jiang, Ziyu, Kong, Lingkun, Moran, Brian, Wang, Jiaqi, Xu, Yifan Ethan, An, Yan, Yang, Chenyu, Yuan, Eting, Zha, Hanwen, Tang, Nan, Chen, Lei, Scheffer, Nicolas, Liu, Yue, Shah, Nirav, Wanga, Rakesh, Kumar, Anuj, Wen-tau Yih, Xin Luna Dong
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Benchmarks Knowledge representation Large language models Questions
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering benchmark of 4,409 question-answer pairs and mock APIs to simulate web and Knowledge Graph (KG) search. CRAG is designed to encapsulate a diverse array of questions across five domains and eight question categories, reflecting varied entity popularity from popular to long-tail, and temporal dynamisms ranging from years to seconds. Our evaluation of this benchmark highlights the gap to fully trustworthy QA. Whereas most advanced LLMs achieve
ISSN:	2331-8422