Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and Empirical Study
Retrieval-augmented generation (RAG) is increasingly recognized as an effective approach for mitigating the hallucination of large language models (LLMs) through the integration of external knowledge. While numerous efforts, most studies focus on a single type of externeal knowledge source. However,...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Retrieval-augmented generation (RAG) is increasingly recognized as an
effective approach for mitigating the hallucination of large language models
(LLMs) through the integration of external knowledge. While numerous efforts,
most studies focus on a single type of externeal knowledge source. However, in
real-world applications, most situations involve diverse knowledge from various
sources, yet this area has been less explored. The main dilemma is the lack of
a suitable dataset containing multiple knowledge sources and pre-exploration of
the associated issues. To address these challenges, we standardize a benchmark
dataset that combines structured and unstructured knowledge across diverse and
complementary domains. Based on this dataset, we further develop a
plug-and-play RAG framework, PruningRAG, whose main characteristic is to employ
multi-granularity pruning strategies for optimizing the integration of relevant
information and minimizing misleading context. Building upon the standardized
dataset and PruningRAG, we also report a series of experimental results, as
well as insightful findings. Our dataset and code are publicly
available\footnote{https://github.com/USTCAGI/PruningRAG}, with the aim of
advancing future research in the RAG community. |
---|---|
DOI: | 10.48550/arxiv.2409.13694 |