Ragnar\"ok: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track
Did you try out the new Bing Search? Or maybe you fiddled around with Google AI~Overviews? These might sound familiar because the modern-day search stack has recently evolved to include retrieval-augmented generation (RAG) systems. They allow searching and incorporating real-time data into large lan...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Did you try out the new Bing Search? Or maybe you fiddled around with Google
AI~Overviews? These might sound familiar because the modern-day search stack
has recently evolved to include retrieval-augmented generation (RAG) systems.
They allow searching and incorporating real-time data into large language
models (LLMs) to provide a well-informed, attributed, concise summary in
contrast to the traditional search paradigm that relies on displaying a ranked
list of documents. Therefore, given these recent advancements, it is crucial to
have an arena to build, test, visualize, and systematically evaluate RAG-based
search systems. With this in mind, we propose the TREC 2024 RAG Track to foster
innovation in evaluating RAG systems. In our work, we lay out the steps we've
made towards making this track a reality -- we describe the details of our
reusable framework, Ragnar\"ok, explain the curation of the new MS MARCO V2.1
collection choice, release the development topics for the track, and
standardize the I/O definitions which assist the end user. Next, using
Ragnar\"ok, we identify and provide key industrial baselines such as OpenAI's
GPT-4o or Cohere's Command R+. Further, we introduce a web-based user interface
for an interactive arena allowing benchmarking pairwise RAG systems by
crowdsourcing. We open-source our Ragnar\"ok framework and baselines to achieve
a unified standard for future RAG systems. |
---|---|
DOI: | 10.48550/arxiv.2406.16828 |