Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents Integration
The proliferation of Large Language Models (LLMs) has led to an influx of AI-generated content (AIGC) on the internet, transforming the corpus of Information Retrieval (IR) systems from solely human-written to a coexistence with LLM-generated content. The impact of this surge in AIGC on IR systems r...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The proliferation of Large Language Models (LLMs) has led to an influx of
AI-generated content (AIGC) on the internet, transforming the corpus of
Information Retrieval (IR) systems from solely human-written to a coexistence
with LLM-generated content. The impact of this surge in AIGC on IR systems
remains an open question, with the primary challenge being the lack of a
dedicated benchmark for researchers. In this paper, we introduce Cocktail, a
comprehensive benchmark tailored for evaluating IR models in this mixed-sourced
data landscape of the LLM era. Cocktail consists of 16 diverse datasets with
mixed human-written and LLM-generated corpora across various text retrieval
tasks and domains. Additionally, to avoid the potential bias from previously
included dataset information in LLMs, we also introduce an up-to-date dataset,
named NQ-UTD, with queries derived from recent events. Through conducting over
1,000 experiments to assess state-of-the-art retrieval models against the
benchmarked datasets in Cocktail, we uncover a clear trade-off between ranking
performance and source bias in neural retrieval models, highlighting the
necessity for a balanced approach in designing future IR systems. We hope
Cocktail can serve as a foundational resource for IR research in the LLM era,
with all data and code publicly available at
\url{https://github.com/KID-22/Cocktail}. |
---|---|
DOI: | 10.48550/arxiv.2405.16546 |