BERGEN: A Benchmarking Library for Retrieval-Augmented Generation
Retrieval-Augmented Generation allows to enhance Large Language Models with external knowledge. In response to the recent popularity of generative LLMs, many RAG approaches have been proposed, which involve an intricate number of different configurations such as evaluation datasets, collections, met...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Retrieval-Augmented Generation allows to enhance Large Language Models with
external knowledge. In response to the recent popularity of generative LLMs,
many RAG approaches have been proposed, which involve an intricate number of
different configurations such as evaluation datasets, collections, metrics,
retrievers, and LLMs. Inconsistent benchmarking poses a major challenge in
comparing approaches and understanding the impact of each component in the
pipeline. In this work, we study best practices that lay the groundwork for a
systematic evaluation of RAG and present BERGEN, an end-to-end library for
reproducible research standardizing RAG experiments. In an extensive study
focusing on QA, we benchmark different state-of-the-art retrievers, rerankers,
and LLMs. Additionally, we analyze existing RAG metrics and datasets. Our
open-source library BERGEN is available under
\url{https://github.com/naver/bergen}. |
---|---|
DOI: | 10.48550/arxiv.2407.01102 |