Generate-then-Ground in Retrieval-Augmented Generation for Multi-hop Question Answering

Multi-Hop Question Answering (MHQA) tasks present a significant challenge for large language models (LLMs) due to the intensive knowledge required. Current solutions, like Retrieval-Augmented Generation, typically retrieve potential documents from an external corpus to read an answer. However, the p...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Shi, Zhengliang, Sun, Weiwei, Gao, Shen, Ren, Pengjie, Chen, Zhumin, Ren, Zhaochun
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language Computer Science - Information Retrieval
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Shi, Zhengliang Sun, Weiwei Gao, Shen Ren, Pengjie Chen, Zhumin Ren, Zhaochun
description	Multi-Hop Question Answering (MHQA) tasks present a significant challenge for large language models (LLMs) due to the intensive knowledge required. Current solutions, like Retrieval-Augmented Generation, typically retrieve potential documents from an external corpus to read an answer. However, the performance of this retrieve-then-read paradigm is constrained by the retriever and the inevitable noise in the retrieved documents. To mitigate these challenges, we introduce a novel generate-then-ground (GenGround) framework, synergizing the parametric knowledge of LLMs and external documents to solve a multi-hop question. GenGround empowers LLMs to alternate two phases until the final answer is derived: (1) formulate a simpler, single-hop question and directly generate the answer; (2) ground the question-answer pair in retrieved documents, amending any wrong predictions in the answer. We also propose an instructional grounding distillation method to generalize our method into smaller models. Extensive experiments conducted on four datasets illustrate the superiority of our method.
doi_str_mv	10.48550/arxiv.2406.14891
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2406_14891</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2406_14891</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-12320295fc97d1c232371d4e29aff0e70ad4afe836e9f9cade0f71d48427d0433</originalsourceid><addsrcrecordid>eNotj81KxDAUhbNxIaMP4Mq8QGr-2jTLMmgVRkQZcFkuzc1MoJMOaTrq2-vUWR3O4ePAR8id4IWuy5I_QPoOp0JqXhVC11Zck88WIybIyPIeI2vTOEdHQ6QfmFPAEwysmXcHjBkdvbBhjNSPib7OQw5sPx7p-4zTMjdx-sIU4u6GXHkYJry95Ipsnx6362e2eWtf1s2GQWUEE1JJLm3pe2uc6P-aMsJplBa852g4OA0ea1Wh9bYHh9yfgVpL47hWakXu_28Xs-6YwgHST3c27BZD9QviIEyI</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Generate-then-Ground in Retrieval-Augmented Generation for Multi-hop Question Answering</title><source>arXiv.org</source><creator>Shi, Zhengliang ; Sun, Weiwei ; Gao, Shen ; Ren, Pengjie ; Chen, Zhumin ; Ren, Zhaochun</creator><creatorcontrib>Shi, Zhengliang ; Sun, Weiwei ; Gao, Shen ; Ren, Pengjie ; Chen, Zhumin ; Ren, Zhaochun</creatorcontrib><description>Multi-Hop Question Answering (MHQA) tasks present a significant challenge for large language models (LLMs) due to the intensive knowledge required. Current solutions, like Retrieval-Augmented Generation, typically retrieve potential documents from an external corpus to read an answer. However, the performance of this retrieve-then-read paradigm is constrained by the retriever and the inevitable noise in the retrieved documents. To mitigate these challenges, we introduce a novel generate-then-ground (GenGround) framework, synergizing the parametric knowledge of LLMs and external documents to solve a multi-hop question. GenGround empowers LLMs to alternate two phases until the final answer is derived: (1) formulate a simpler, single-hop question and directly generate the answer; (2) ground the question-answer pair in retrieved documents, amending any wrong predictions in the answer. We also propose an instructional grounding distillation method to generalize our method into smaller models. Extensive experiments conducted on four datasets illustrate the superiority of our method.</description><identifier>DOI: 10.48550/arxiv.2406.14891</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Information Retrieval</subject><creationdate>2024-06</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2406.14891$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2406.14891$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Shi, Zhengliang</creatorcontrib><creatorcontrib>Sun, Weiwei</creatorcontrib><creatorcontrib>Gao, Shen</creatorcontrib><creatorcontrib>Ren, Pengjie</creatorcontrib><creatorcontrib>Chen, Zhumin</creatorcontrib><creatorcontrib>Ren, Zhaochun</creatorcontrib><title>Generate-then-Ground in Retrieval-Augmented Generation for Multi-hop Question Answering</title><description>Multi-Hop Question Answering (MHQA) tasks present a significant challenge for large language models (LLMs) due to the intensive knowledge required. Current solutions, like Retrieval-Augmented Generation, typically retrieve potential documents from an external corpus to read an answer. However, the performance of this retrieve-then-read paradigm is constrained by the retriever and the inevitable noise in the retrieved documents. To mitigate these challenges, we introduce a novel generate-then-ground (GenGround) framework, synergizing the parametric knowledge of LLMs and external documents to solve a multi-hop question. GenGround empowers LLMs to alternate two phases until the final answer is derived: (1) formulate a simpler, single-hop question and directly generate the answer; (2) ground the question-answer pair in retrieved documents, amending any wrong predictions in the answer. We also propose an instructional grounding distillation method to generalize our method into smaller models. Extensive experiments conducted on four datasets illustrate the superiority of our method.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Information Retrieval</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj81KxDAUhbNxIaMP4Mq8QGr-2jTLMmgVRkQZcFkuzc1MoJMOaTrq2-vUWR3O4ePAR8id4IWuy5I_QPoOp0JqXhVC11Zck88WIybIyPIeI2vTOEdHQ6QfmFPAEwysmXcHjBkdvbBhjNSPib7OQw5sPx7p-4zTMjdx-sIU4u6GXHkYJry95Ipsnx6362e2eWtf1s2GQWUEE1JJLm3pe2uc6P-aMsJplBa852g4OA0ea1Wh9bYHh9yfgVpL47hWakXu_28Xs-6YwgHST3c27BZD9QviIEyI</recordid><startdate>20240621</startdate><enddate>20240621</enddate><creator>Shi, Zhengliang</creator><creator>Sun, Weiwei</creator><creator>Gao, Shen</creator><creator>Ren, Pengjie</creator><creator>Chen, Zhumin</creator><creator>Ren, Zhaochun</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240621</creationdate><title>Generate-then-Ground in Retrieval-Augmented Generation for Multi-hop Question Answering</title><author>Shi, Zhengliang ; Sun, Weiwei ; Gao, Shen ; Ren, Pengjie ; Chen, Zhumin ; Ren, Zhaochun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-12320295fc97d1c232371d4e29aff0e70ad4afe836e9f9cade0f71d48427d0433</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Information Retrieval</topic><toplevel>online_resources</toplevel><creatorcontrib>Shi, Zhengliang</creatorcontrib><creatorcontrib>Sun, Weiwei</creatorcontrib><creatorcontrib>Gao, Shen</creatorcontrib><creatorcontrib>Ren, Pengjie</creatorcontrib><creatorcontrib>Chen, Zhumin</creatorcontrib><creatorcontrib>Ren, Zhaochun</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Shi, Zhengliang</au><au>Sun, Weiwei</au><au>Gao, Shen</au><au>Ren, Pengjie</au><au>Chen, Zhumin</au><au>Ren, Zhaochun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Generate-then-Ground in Retrieval-Augmented Generation for Multi-hop Question Answering</atitle><date>2024-06-21</date><risdate>2024</risdate><abstract>Multi-Hop Question Answering (MHQA) tasks present a significant challenge for large language models (LLMs) due to the intensive knowledge required. Current solutions, like Retrieval-Augmented Generation, typically retrieve potential documents from an external corpus to read an answer. However, the performance of this retrieve-then-read paradigm is constrained by the retriever and the inevitable noise in the retrieved documents. To mitigate these challenges, we introduce a novel generate-then-ground (GenGround) framework, synergizing the parametric knowledge of LLMs and external documents to solve a multi-hop question. GenGround empowers LLMs to alternate two phases until the final answer is derived: (1) formulate a simpler, single-hop question and directly generate the answer; (2) ground the question-answer pair in retrieved documents, amending any wrong predictions in the answer. We also propose an instructional grounding distillation method to generalize our method into smaller models. Extensive experiments conducted on four datasets illustrate the superiority of our method.</abstract><doi>10.48550/arxiv.2406.14891</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2406.14891
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2406_14891
source	arXiv.org
subjects	Computer Science - Computation and Language Computer Science - Information Retrieval
title	Generate-then-Ground in Retrieval-Augmented Generation for Multi-hop Question Answering
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T09%3A02%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Generate-then-Ground%20in%20Retrieval-Augmented%20Generation%20for%20Multi-hop%20Question%20Answering&rft.au=Shi,%20Zhengliang&rft.date=2024-06-21&rft_id=info:doi/10.48550/arxiv.2406.14891&rft_dat=%3Carxiv_GOX%3E2406_14891%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true