Context-Driven Index Trimming: A Data Quality Perspective to Enhancing Precision of RALMs
Retrieval-Augmented Large Language Models (RALMs) have made significant strides in enhancing the accuracy of generated responses.However, existing research often overlooks the data quality issues within retrieval results, often caused by inaccurate existing vector-distance-based retrieval methods.We...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Ma, Kexin Jin, Ruochun Wang, Xi Chen, Huan Ren, Jing Tang, Yuhua |
description | Retrieval-Augmented Large Language Models (RALMs) have made significant
strides in enhancing the accuracy of generated responses.However, existing
research often overlooks the data quality issues within retrieval results,
often caused by inaccurate existing vector-distance-based retrieval methods.We
propose to boost the precision of RALMs' answers from a data quality
perspective through the Context-Driven Index Trimming (CDIT) framework, where
Context Matching Dependencies (CMDs) are employed as logical data quality rules
to capture and regulate the consistency between retrieved contexts.Based on the
semantic comprehension capabilities of Large Language Models (LLMs), CDIT can
effectively identify and discard retrieval results that are inconsistent with
the query context and further modify indexes in the database, thereby improving
answer quality.Experiments demonstrate on challenging question-answering
tasks.Also, the flexibility of CDIT is verified through its compatibility with
various language models and indexing methods, which offers a promising approach
to bolster RALMs' data quality and retrieval precision jointly. |
doi_str_mv | 10.48550/arxiv.2408.05524 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2408_05524</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2408_05524</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2408_055243</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw0DMwNTUy4WSIdM7PK0mtKNF1KcosS81T8MxLSa1QCCnKzM3NzEu3UnBUcEksSVQILE3MySypVAhILSouSE0uAapVKMlXcM3LSMxLBipUCChKTc4szszPU8hPUwhy9PEt5mFgTUvMKU7lhdLcDPJuriHOHrpgR8QXAK1ILKqMBzkmHuwYY8IqAPwuPvY</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Context-Driven Index Trimming: A Data Quality Perspective to Enhancing Precision of RALMs</title><source>arXiv.org</source><creator>Ma, Kexin ; Jin, Ruochun ; Wang, Xi ; Chen, Huan ; Ren, Jing ; Tang, Yuhua</creator><creatorcontrib>Ma, Kexin ; Jin, Ruochun ; Wang, Xi ; Chen, Huan ; Ren, Jing ; Tang, Yuhua</creatorcontrib><description>Retrieval-Augmented Large Language Models (RALMs) have made significant
strides in enhancing the accuracy of generated responses.However, existing
research often overlooks the data quality issues within retrieval results,
often caused by inaccurate existing vector-distance-based retrieval methods.We
propose to boost the precision of RALMs' answers from a data quality
perspective through the Context-Driven Index Trimming (CDIT) framework, where
Context Matching Dependencies (CMDs) are employed as logical data quality rules
to capture and regulate the consistency between retrieved contexts.Based on the
semantic comprehension capabilities of Large Language Models (LLMs), CDIT can
effectively identify and discard retrieval results that are inconsistent with
the query context and further modify indexes in the database, thereby improving
answer quality.Experiments demonstrate on challenging question-answering
tasks.Also, the flexibility of CDIT is verified through its compatibility with
various language models and indexing methods, which offers a promising approach
to bolster RALMs' data quality and retrieval precision jointly.</description><identifier>DOI: 10.48550/arxiv.2408.05524</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Databases</subject><creationdate>2024-08</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2408.05524$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2408.05524$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Ma, Kexin</creatorcontrib><creatorcontrib>Jin, Ruochun</creatorcontrib><creatorcontrib>Wang, Xi</creatorcontrib><creatorcontrib>Chen, Huan</creatorcontrib><creatorcontrib>Ren, Jing</creatorcontrib><creatorcontrib>Tang, Yuhua</creatorcontrib><title>Context-Driven Index Trimming: A Data Quality Perspective to Enhancing Precision of RALMs</title><description>Retrieval-Augmented Large Language Models (RALMs) have made significant
strides in enhancing the accuracy of generated responses.However, existing
research often overlooks the data quality issues within retrieval results,
often caused by inaccurate existing vector-distance-based retrieval methods.We
propose to boost the precision of RALMs' answers from a data quality
perspective through the Context-Driven Index Trimming (CDIT) framework, where
Context Matching Dependencies (CMDs) are employed as logical data quality rules
to capture and regulate the consistency between retrieved contexts.Based on the
semantic comprehension capabilities of Large Language Models (LLMs), CDIT can
effectively identify and discard retrieval results that are inconsistent with
the query context and further modify indexes in the database, thereby improving
answer quality.Experiments demonstrate on challenging question-answering
tasks.Also, the flexibility of CDIT is verified through its compatibility with
various language models and indexing methods, which offers a promising approach
to bolster RALMs' data quality and retrieval precision jointly.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Databases</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw0DMwNTUy4WSIdM7PK0mtKNF1KcosS81T8MxLSa1QCCnKzM3NzEu3UnBUcEksSVQILE3MySypVAhILSouSE0uAapVKMlXcM3LSMxLBipUCChKTc4szszPU8hPUwhy9PEt5mFgTUvMKU7lhdLcDPJuriHOHrpgR8QXAK1ILKqMBzkmHuwYY8IqAPwuPvY</recordid><startdate>20240810</startdate><enddate>20240810</enddate><creator>Ma, Kexin</creator><creator>Jin, Ruochun</creator><creator>Wang, Xi</creator><creator>Chen, Huan</creator><creator>Ren, Jing</creator><creator>Tang, Yuhua</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240810</creationdate><title>Context-Driven Index Trimming: A Data Quality Perspective to Enhancing Precision of RALMs</title><author>Ma, Kexin ; Jin, Ruochun ; Wang, Xi ; Chen, Huan ; Ren, Jing ; Tang, Yuhua</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2408_055243</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Databases</topic><toplevel>online_resources</toplevel><creatorcontrib>Ma, Kexin</creatorcontrib><creatorcontrib>Jin, Ruochun</creatorcontrib><creatorcontrib>Wang, Xi</creatorcontrib><creatorcontrib>Chen, Huan</creatorcontrib><creatorcontrib>Ren, Jing</creatorcontrib><creatorcontrib>Tang, Yuhua</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ma, Kexin</au><au>Jin, Ruochun</au><au>Wang, Xi</au><au>Chen, Huan</au><au>Ren, Jing</au><au>Tang, Yuhua</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Context-Driven Index Trimming: A Data Quality Perspective to Enhancing Precision of RALMs</atitle><date>2024-08-10</date><risdate>2024</risdate><abstract>Retrieval-Augmented Large Language Models (RALMs) have made significant
strides in enhancing the accuracy of generated responses.However, existing
research often overlooks the data quality issues within retrieval results,
often caused by inaccurate existing vector-distance-based retrieval methods.We
propose to boost the precision of RALMs' answers from a data quality
perspective through the Context-Driven Index Trimming (CDIT) framework, where
Context Matching Dependencies (CMDs) are employed as logical data quality rules
to capture and regulate the consistency between retrieved contexts.Based on the
semantic comprehension capabilities of Large Language Models (LLMs), CDIT can
effectively identify and discard retrieval results that are inconsistent with
the query context and further modify indexes in the database, thereby improving
answer quality.Experiments demonstrate on challenging question-answering
tasks.Also, the flexibility of CDIT is verified through its compatibility with
various language models and indexing methods, which offers a promising approach
to bolster RALMs' data quality and retrieval precision jointly.</abstract><doi>10.48550/arxiv.2408.05524</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2408.05524 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2408_05524 |
source | arXiv.org |
subjects | Computer Science - Computation and Language Computer Science - Databases |
title | Context-Driven Index Trimming: A Data Quality Perspective to Enhancing Precision of RALMs |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T19%3A32%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Context-Driven%20Index%20Trimming:%20A%20Data%20Quality%20Perspective%20to%20Enhancing%20Precision%20of%20RALMs&rft.au=Ma,%20Kexin&rft.date=2024-08-10&rft_id=info:doi/10.48550/arxiv.2408.05524&rft_dat=%3Carxiv_GOX%3E2408_05524%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |