SPUQ: Perturbation-Based Uncertainty Quantification for Large Language Models

In recent years, large language models (LLMs) have become increasingly prevalent, offering remarkable text generation capabilities. However, a pressing challenge is their tendency to make confidently wrong predictions, highlighting the critical need for uncertainty quantification (UQ) in LLMs. While...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Gao, Xiang, Zhang, Jiaxin, Mouatadid, Lalla, Das, Kamalika
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computation and Language
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Gao, Xiang Zhang, Jiaxin Mouatadid, Lalla Das, Kamalika
description	In recent years, large language models (LLMs) have become increasingly prevalent, offering remarkable text generation capabilities. However, a pressing challenge is their tendency to make confidently wrong predictions, highlighting the critical need for uncertainty quantification (UQ) in LLMs. While previous works have mainly focused on addressing aleatoric uncertainty, the full spectrum of uncertainties, including epistemic, remains inadequately explored. Motivated by this gap, we introduce a novel UQ method, sampling with perturbation for UQ (SPUQ), designed to tackle both aleatoric and epistemic uncertainties. The method entails generating a set of perturbations for LLM inputs, sampling outputs for each perturbation, and incorporating an aggregation module that generalizes the sampling uncertainty approach for text generation tasks. Through extensive experiments on various datasets, we investigated different perturbation and aggregation techniques. Our findings show a substantial improvement in model uncertainty calibration, with a reduction in Expected Calibration Error (ECE) by 50\% on average. Our findings suggest that our proposed UQ method offers promising steps toward enhancing the reliability and trustworthiness of LLMs.
doi_str_mv	10.48550/arxiv.2403.02509
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2403_02509</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2403_02509</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-a5e55cf482d7412fcba1174d8ca0553686a946e0595d7399faa79d39ab8150873</originalsourceid><addsrcrecordid>eNotj01OwzAUhL1hgVoOwApfIMGO_WK7O6j4k1LRinYdvcR2Zak4yEkQvT0hZTMzGo1G-gi55SyXGoDdY_oJ33khmchZAcxck83H9rBb0a1Lw5gaHEIXs0fsnaWH2E4lhjic6W7EOAQf2nlAfZdohenoJo3HEaew6aw79Uty5fHUu5t_X5D989N-_ZpV7y9v64cqw1KZDMEBtF7qwirJC982yLmSVrfIAESpSzSydAwMWCWM8YjKWGGw0RyYVmJB7i63M0_9lcInpnP9x1XPXOIXDEFH0g</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>SPUQ: Perturbation-Based Uncertainty Quantification for Large Language Models</title><source>arXiv.org</source><creator>Gao, Xiang ; Zhang, Jiaxin ; Mouatadid, Lalla ; Das, Kamalika</creator><creatorcontrib>Gao, Xiang ; Zhang, Jiaxin ; Mouatadid, Lalla ; Das, Kamalika</creatorcontrib><description>In recent years, large language models (LLMs) have become increasingly prevalent, offering remarkable text generation capabilities. However, a pressing challenge is their tendency to make confidently wrong predictions, highlighting the critical need for uncertainty quantification (UQ) in LLMs. While previous works have mainly focused on addressing aleatoric uncertainty, the full spectrum of uncertainties, including epistemic, remains inadequately explored. Motivated by this gap, we introduce a novel UQ method, sampling with perturbation for UQ (SPUQ), designed to tackle both aleatoric and epistemic uncertainties. The method entails generating a set of perturbations for LLM inputs, sampling outputs for each perturbation, and incorporating an aggregation module that generalizes the sampling uncertainty approach for text generation tasks. Through extensive experiments on various datasets, we investigated different perturbation and aggregation techniques. Our findings show a substantial improvement in model uncertainty calibration, with a reduction in Expected Calibration Error (ECE) by 50\% on average. Our findings suggest that our proposed UQ method offers promising steps toward enhancing the reliability and trustworthiness of LLMs.</description><identifier>DOI: 10.48550/arxiv.2403.02509</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language</subject><creationdate>2024-03</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2403.02509$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2403.02509$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Gao, Xiang</creatorcontrib><creatorcontrib>Zhang, Jiaxin</creatorcontrib><creatorcontrib>Mouatadid, Lalla</creatorcontrib><creatorcontrib>Das, Kamalika</creatorcontrib><title>SPUQ: Perturbation-Based Uncertainty Quantification for Large Language Models</title><description>In recent years, large language models (LLMs) have become increasingly prevalent, offering remarkable text generation capabilities. However, a pressing challenge is their tendency to make confidently wrong predictions, highlighting the critical need for uncertainty quantification (UQ) in LLMs. While previous works have mainly focused on addressing aleatoric uncertainty, the full spectrum of uncertainties, including epistemic, remains inadequately explored. Motivated by this gap, we introduce a novel UQ method, sampling with perturbation for UQ (SPUQ), designed to tackle both aleatoric and epistemic uncertainties. The method entails generating a set of perturbations for LLM inputs, sampling outputs for each perturbation, and incorporating an aggregation module that generalizes the sampling uncertainty approach for text generation tasks. Through extensive experiments on various datasets, we investigated different perturbation and aggregation techniques. Our findings show a substantial improvement in model uncertainty calibration, with a reduction in Expected Calibration Error (ECE) by 50\% on average. Our findings suggest that our proposed UQ method offers promising steps toward enhancing the reliability and trustworthiness of LLMs.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj01OwzAUhL1hgVoOwApfIMGO_WK7O6j4k1LRinYdvcR2Zak4yEkQvT0hZTMzGo1G-gi55SyXGoDdY_oJ33khmchZAcxck83H9rBb0a1Lw5gaHEIXs0fsnaWH2E4lhjic6W7EOAQf2nlAfZdohenoJo3HEaew6aw79Uty5fHUu5t_X5D989N-_ZpV7y9v64cqw1KZDMEBtF7qwirJC982yLmSVrfIAESpSzSydAwMWCWM8YjKWGGw0RyYVmJB7i63M0_9lcInpnP9x1XPXOIXDEFH0g</recordid><startdate>20240304</startdate><enddate>20240304</enddate><creator>Gao, Xiang</creator><creator>Zhang, Jiaxin</creator><creator>Mouatadid, Lalla</creator><creator>Das, Kamalika</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240304</creationdate><title>SPUQ: Perturbation-Based Uncertainty Quantification for Large Language Models</title><author>Gao, Xiang ; Zhang, Jiaxin ; Mouatadid, Lalla ; Das, Kamalika</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-a5e55cf482d7412fcba1174d8ca0553686a946e0595d7399faa79d39ab8150873</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Gao, Xiang</creatorcontrib><creatorcontrib>Zhang, Jiaxin</creatorcontrib><creatorcontrib>Mouatadid, Lalla</creatorcontrib><creatorcontrib>Das, Kamalika</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Gao, Xiang</au><au>Zhang, Jiaxin</au><au>Mouatadid, Lalla</au><au>Das, Kamalika</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SPUQ: Perturbation-Based Uncertainty Quantification for Large Language Models</atitle><date>2024-03-04</date><risdate>2024</risdate><abstract>In recent years, large language models (LLMs) have become increasingly prevalent, offering remarkable text generation capabilities. However, a pressing challenge is their tendency to make confidently wrong predictions, highlighting the critical need for uncertainty quantification (UQ) in LLMs. While previous works have mainly focused on addressing aleatoric uncertainty, the full spectrum of uncertainties, including epistemic, remains inadequately explored. Motivated by this gap, we introduce a novel UQ method, sampling with perturbation for UQ (SPUQ), designed to tackle both aleatoric and epistemic uncertainties. The method entails generating a set of perturbations for LLM inputs, sampling outputs for each perturbation, and incorporating an aggregation module that generalizes the sampling uncertainty approach for text generation tasks. Through extensive experiments on various datasets, we investigated different perturbation and aggregation techniques. Our findings show a substantial improvement in model uncertainty calibration, with a reduction in Expected Calibration Error (ECE) by 50\% on average. Our findings suggest that our proposed UQ method offers promising steps toward enhancing the reliability and trustworthiness of LLMs.</abstract><doi>10.48550/arxiv.2403.02509</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2403.02509
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2403_02509
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Computation and Language
title	SPUQ: Perturbation-Based Uncertainty Quantification for Large Language Models
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T02%3A15%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SPUQ:%20Perturbation-Based%20Uncertainty%20Quantification%20for%20Large%20Language%20Models&rft.au=Gao,%20Xiang&rft.date=2024-03-04&rft_id=info:doi/10.48550/arxiv.2403.02509&rft_dat=%3Carxiv_GOX%3E2403_02509%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true