Self-DC: When to retrieve and When to generate? Self Divide-and-Conquer for Compositional Unknown Questions
Retrieve-then-read and generate-then-read are two typical solutions to handle unknown and known questions in open-domain question-answering, while the former retrieves necessary external knowledge and the later prompt the large language models to generate internal known knowledge encoded in the para...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Retrieve-then-read and generate-then-read are two typical solutions to handle
unknown and known questions in open-domain question-answering, while the former
retrieves necessary external knowledge and the later prompt the large language
models to generate internal known knowledge encoded in the parameters. However,
few of previous works consider the compositional unknown questions, which
consist of several known or unknown sub-questions. Thus, simple binary
classification (known or unknown) becomes sub-optimal and inefficient since it
will call external retrieval excessively for each compositional unknown
question. To this end, we propose the first Compositional unknown
Question-Answering dataset (CuQA), and introduce a Self Divide-and-Conquer
(Self-DC) framework to empower LLMs to adaptively call different methods
on-demand, resulting in better performance and efficiency. Experimental results
on two datasets (CuQA and FreshQA) demonstrate that Self-DC can achieve
comparable or even better performance with much more less retrieval times
compared with several strong baselines. |
---|---|
DOI: | 10.48550/arxiv.2402.13514 |