SPUQ: Perturbation-Based Uncertainty Quantification for Large Language Models

In recent years, large language models (LLMs) have become increasingly prevalent, offering remarkable text generation capabilities. However, a pressing challenge is their tendency to make confidently wrong predictions, highlighting the critical need for uncertainty quantification (UQ) in LLMs. While...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Gao, Xiang, Zhang, Jiaxin, Mouatadid, Lalla, Das, Kamalika
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Gao, Xiang
Zhang, Jiaxin
Mouatadid, Lalla
Das, Kamalika
description In recent years, large language models (LLMs) have become increasingly prevalent, offering remarkable text generation capabilities. However, a pressing challenge is their tendency to make confidently wrong predictions, highlighting the critical need for uncertainty quantification (UQ) in LLMs. While previous works have mainly focused on addressing aleatoric uncertainty, the full spectrum of uncertainties, including epistemic, remains inadequately explored. Motivated by this gap, we introduce a novel UQ method, sampling with perturbation for UQ (SPUQ), designed to tackle both aleatoric and epistemic uncertainties. The method entails generating a set of perturbations for LLM inputs, sampling outputs for each perturbation, and incorporating an aggregation module that generalizes the sampling uncertainty approach for text generation tasks. Through extensive experiments on various datasets, we investigated different perturbation and aggregation techniques. Our findings show a substantial improvement in model uncertainty calibration, with a reduction in Expected Calibration Error (ECE) by 50\% on average. Our findings suggest that our proposed UQ method offers promising steps toward enhancing the reliability and trustworthiness of LLMs.
doi_str_mv 10.48550/arxiv.2403.02509
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2403_02509</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2403_02509</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-a5e55cf482d7412fcba1174d8ca0553686a946e0595d7399faa79d39ab8150873</originalsourceid><addsrcrecordid>eNotj01OwzAUhL1hgVoOwApfIMGO_WK7O6j4k1LRinYdvcR2Zak4yEkQvT0hZTMzGo1G-gi55SyXGoDdY_oJ33khmchZAcxck83H9rBb0a1Lw5gaHEIXs0fsnaWH2E4lhjic6W7EOAQf2nlAfZdohenoJo3HEaew6aw79Uty5fHUu5t_X5D989N-_ZpV7y9v64cqw1KZDMEBtF7qwirJC982yLmSVrfIAESpSzSydAwMWCWM8YjKWGGw0RyYVmJB7i63M0_9lcInpnP9x1XPXOIXDEFH0g</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>SPUQ: Perturbation-Based Uncertainty Quantification for Large Language Models</title><source>arXiv.org</source><creator>Gao, Xiang ; Zhang, Jiaxin ; Mouatadid, Lalla ; Das, Kamalika</creator><creatorcontrib>Gao, Xiang ; Zhang, Jiaxin ; Mouatadid, Lalla ; Das, Kamalika</creatorcontrib><description>In recent years, large language models (LLMs) have become increasingly prevalent, offering remarkable text generation capabilities. However, a pressing challenge is their tendency to make confidently wrong predictions, highlighting the critical need for uncertainty quantification (UQ) in LLMs. While previous works have mainly focused on addressing aleatoric uncertainty, the full spectrum of uncertainties, including epistemic, remains inadequately explored. Motivated by this gap, we introduce a novel UQ method, sampling with perturbation for UQ (SPUQ), designed to tackle both aleatoric and epistemic uncertainties. The method entails generating a set of perturbations for LLM inputs, sampling outputs for each perturbation, and incorporating an aggregation module that generalizes the sampling uncertainty approach for text generation tasks. Through extensive experiments on various datasets, we investigated different perturbation and aggregation techniques. Our findings show a substantial improvement in model uncertainty calibration, with a reduction in Expected Calibration Error (ECE) by 50\% on average. Our findings suggest that our proposed UQ method offers promising steps toward enhancing the reliability and trustworthiness of LLMs.</description><identifier>DOI: 10.48550/arxiv.2403.02509</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language</subject><creationdate>2024-03</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2403.02509$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2403.02509$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Gao, Xiang</creatorcontrib><creatorcontrib>Zhang, Jiaxin</creatorcontrib><creatorcontrib>Mouatadid, Lalla</creatorcontrib><creatorcontrib>Das, Kamalika</creatorcontrib><title>SPUQ: Perturbation-Based Uncertainty Quantification for Large Language Models</title><description>In recent years, large language models (LLMs) have become increasingly prevalent, offering remarkable text generation capabilities. However, a pressing challenge is their tendency to make confidently wrong predictions, highlighting the critical need for uncertainty quantification (UQ) in LLMs. While previous works have mainly focused on addressing aleatoric uncertainty, the full spectrum of uncertainties, including epistemic, remains inadequately explored. Motivated by this gap, we introduce a novel UQ method, sampling with perturbation for UQ (SPUQ), designed to tackle both aleatoric and epistemic uncertainties. The method entails generating a set of perturbations for LLM inputs, sampling outputs for each perturbation, and incorporating an aggregation module that generalizes the sampling uncertainty approach for text generation tasks. Through extensive experiments on various datasets, we investigated different perturbation and aggregation techniques. Our findings show a substantial improvement in model uncertainty calibration, with a reduction in Expected Calibration Error (ECE) by 50\% on average. Our findings suggest that our proposed UQ method offers promising steps toward enhancing the reliability and trustworthiness of LLMs.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj01OwzAUhL1hgVoOwApfIMGO_WK7O6j4k1LRinYdvcR2Zak4yEkQvT0hZTMzGo1G-gi55SyXGoDdY_oJ33khmchZAcxck83H9rBb0a1Lw5gaHEIXs0fsnaWH2E4lhjic6W7EOAQf2nlAfZdohenoJo3HEaew6aw79Uty5fHUu5t_X5D989N-_ZpV7y9v64cqw1KZDMEBtF7qwirJC982yLmSVrfIAESpSzSydAwMWCWM8YjKWGGw0RyYVmJB7i63M0_9lcInpnP9x1XPXOIXDEFH0g</recordid><startdate>20240304</startdate><enddate>20240304</enddate><creator>Gao, Xiang</creator><creator>Zhang, Jiaxin</creator><creator>Mouatadid, Lalla</creator><creator>Das, Kamalika</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240304</creationdate><title>SPUQ: Perturbation-Based Uncertainty Quantification for Large Language Models</title><author>Gao, Xiang ; Zhang, Jiaxin ; Mouatadid, Lalla ; Das, Kamalika</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-a5e55cf482d7412fcba1174d8ca0553686a946e0595d7399faa79d39ab8150873</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Gao, Xiang</creatorcontrib><creatorcontrib>Zhang, Jiaxin</creatorcontrib><creatorcontrib>Mouatadid, Lalla</creatorcontrib><creatorcontrib>Das, Kamalika</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Gao, Xiang</au><au>Zhang, Jiaxin</au><au>Mouatadid, Lalla</au><au>Das, Kamalika</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SPUQ: Perturbation-Based Uncertainty Quantification for Large Language Models</atitle><date>2024-03-04</date><risdate>2024</risdate><abstract>In recent years, large language models (LLMs) have become increasingly prevalent, offering remarkable text generation capabilities. However, a pressing challenge is their tendency to make confidently wrong predictions, highlighting the critical need for uncertainty quantification (UQ) in LLMs. While previous works have mainly focused on addressing aleatoric uncertainty, the full spectrum of uncertainties, including epistemic, remains inadequately explored. Motivated by this gap, we introduce a novel UQ method, sampling with perturbation for UQ (SPUQ), designed to tackle both aleatoric and epistemic uncertainties. The method entails generating a set of perturbations for LLM inputs, sampling outputs for each perturbation, and incorporating an aggregation module that generalizes the sampling uncertainty approach for text generation tasks. Through extensive experiments on various datasets, we investigated different perturbation and aggregation techniques. Our findings show a substantial improvement in model uncertainty calibration, with a reduction in Expected Calibration Error (ECE) by 50\% on average. Our findings suggest that our proposed UQ method offers promising steps toward enhancing the reliability and trustworthiness of LLMs.</abstract><doi>10.48550/arxiv.2403.02509</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2403.02509
ispartof
issn
language eng
recordid cdi_arxiv_primary_2403_02509
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Computation and Language
title SPUQ: Perturbation-Based Uncertainty Quantification for Large Language Models
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T02%3A15%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SPUQ:%20Perturbation-Based%20Uncertainty%20Quantification%20for%20Large%20Language%20Models&rft.au=Gao,%20Xiang&rft.date=2024-03-04&rft_id=info:doi/10.48550/arxiv.2403.02509&rft_dat=%3Carxiv_GOX%3E2403_02509%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true