Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation

We evaluate the ability of Large Language Models (LLMs) to discern and express their internal knowledge state, a key factor in countering factual hallucination and ensuring reliable application of LLMs. We observe a robust self-awareness of internal knowledge state in LLMs, evidenced by over 85% acc...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Liang, Yuxin, Song, Zhuoyang, Wang, Hao, Zhang, Jiaxing
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Liang, Yuxin Song, Zhuoyang Wang, Hao Zhang, Jiaxing
description	We evaluate the ability of Large Language Models (LLMs) to discern and express their internal knowledge state, a key factor in countering factual hallucination and ensuring reliable application of LLMs. We observe a robust self-awareness of internal knowledge state in LLMs, evidenced by over 85% accuracy in knowledge probing. However, LLMs often fail to express their internal knowledge during generation, leading to factual hallucinations. We develop an automated hallucination annotation tool, Dreamcatcher, which merges knowledge probing and consistency checking methods to rank factual preference data. Using knowledge preference as reward, We propose a Reinforcement Learning from Knowledge Feedback (RLKF) training framework, leveraging reinforcement learning to enhance the factuality and honesty of LLMs. Our experiments across multiple models show that RLKF training effectively enhances the ability of models to utilize their internal knowledge state, boosting performance in a variety of knowledge-based and honesty-related tasks.
doi_str_mv	10.48550/arxiv.2401.15449
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2401_15449</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2401_15449</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-d9901fe079ae10208df24db49051370b9e8c6a6dd92d1d4fbf5dccb097b8d3243</originalsourceid><addsrcrecordid>eNotj71OwzAUhb0woMIDMOEXSLAd58dsqKIUyRUDWTpFN_F1ZMl1kJ0UeHtoYDpHR5-O9BFyx1kum7JkDxC_3DkXkvGcl1Kqa9JphBhcGOk80TYuaabHaYl0h-h_1_RINZ4xwnhB3tHbDD4hYsCUqAtU60Oidop0D94vgwswuynQg5vduNYbcmXBJ7z9zw1pd8_tdp_pt5fX7ZPOoKpVZpRi3CKrFSBngjXGCml6qVjJi5r1CpuhgsoYJQw30va2NMPQM1X3jSmELDbk_u92New-ojtB_O4upt1qWvwAakJPTg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation</title><source>arXiv.org</source><creator>Liang, Yuxin ; Song, Zhuoyang ; Wang, Hao ; Zhang, Jiaxing</creator><creatorcontrib>Liang, Yuxin ; Song, Zhuoyang ; Wang, Hao ; Zhang, Jiaxing</creatorcontrib><description>We evaluate the ability of Large Language Models (LLMs) to discern and express their internal knowledge state, a key factor in countering factual hallucination and ensuring reliable application of LLMs. We observe a robust self-awareness of internal knowledge state in LLMs, evidenced by over 85% accuracy in knowledge probing. However, LLMs often fail to express their internal knowledge during generation, leading to factual hallucinations. We develop an automated hallucination annotation tool, Dreamcatcher, which merges knowledge probing and consistency checking methods to rank factual preference data. Using knowledge preference as reward, We propose a Reinforcement Learning from Knowledge Feedback (RLKF) training framework, leveraging reinforcement learning to enhance the factuality and honesty of LLMs. Our experiments across multiple models show that RLKF training effectively enhances the ability of models to utilize their internal knowledge state, boosting performance in a variety of knowledge-based and honesty-related tasks.</description><identifier>DOI: 10.48550/arxiv.2401.15449</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2024-01</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2401.15449$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2401.15449$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Liang, Yuxin</creatorcontrib><creatorcontrib>Song, Zhuoyang</creatorcontrib><creatorcontrib>Wang, Hao</creatorcontrib><creatorcontrib>Zhang, Jiaxing</creatorcontrib><title>Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation</title><description>We evaluate the ability of Large Language Models (LLMs) to discern and express their internal knowledge state, a key factor in countering factual hallucination and ensuring reliable application of LLMs. We observe a robust self-awareness of internal knowledge state in LLMs, evidenced by over 85% accuracy in knowledge probing. However, LLMs often fail to express their internal knowledge during generation, leading to factual hallucinations. We develop an automated hallucination annotation tool, Dreamcatcher, which merges knowledge probing and consistency checking methods to rank factual preference data. Using knowledge preference as reward, We propose a Reinforcement Learning from Knowledge Feedback (RLKF) training framework, leveraging reinforcement learning to enhance the factuality and honesty of LLMs. Our experiments across multiple models show that RLKF training effectively enhances the ability of models to utilize their internal knowledge state, boosting performance in a variety of knowledge-based and honesty-related tasks.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj71OwzAUhb0woMIDMOEXSLAd58dsqKIUyRUDWTpFN_F1ZMl1kJ0UeHtoYDpHR5-O9BFyx1kum7JkDxC_3DkXkvGcl1Kqa9JphBhcGOk80TYuaabHaYl0h-h_1_RINZ4xwnhB3tHbDD4hYsCUqAtU60Oidop0D94vgwswuynQg5vduNYbcmXBJ7z9zw1pd8_tdp_pt5fX7ZPOoKpVZpRi3CKrFSBngjXGCml6qVjJi5r1CpuhgsoYJQw30va2NMPQM1X3jSmELDbk_u92New-ojtB_O4upt1qWvwAakJPTg</recordid><startdate>20240127</startdate><enddate>20240127</enddate><creator>Liang, Yuxin</creator><creator>Song, Zhuoyang</creator><creator>Wang, Hao</creator><creator>Zhang, Jiaxing</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240127</creationdate><title>Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation</title><author>Liang, Yuxin ; Song, Zhuoyang ; Wang, Hao ; Zhang, Jiaxing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-d9901fe079ae10208df24db49051370b9e8c6a6dd92d1d4fbf5dccb097b8d3243</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Liang, Yuxin</creatorcontrib><creatorcontrib>Song, Zhuoyang</creatorcontrib><creatorcontrib>Wang, Hao</creatorcontrib><creatorcontrib>Zhang, Jiaxing</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liang, Yuxin</au><au>Song, Zhuoyang</au><au>Wang, Hao</au><au>Zhang, Jiaxing</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation</atitle><date>2024-01-27</date><risdate>2024</risdate><abstract>We evaluate the ability of Large Language Models (LLMs) to discern and express their internal knowledge state, a key factor in countering factual hallucination and ensuring reliable application of LLMs. We observe a robust self-awareness of internal knowledge state in LLMs, evidenced by over 85% accuracy in knowledge probing. However, LLMs often fail to express their internal knowledge during generation, leading to factual hallucinations. We develop an automated hallucination annotation tool, Dreamcatcher, which merges knowledge probing and consistency checking methods to rank factual preference data. Using knowledge preference as reward, We propose a Reinforcement Learning from Knowledge Feedback (RLKF) training framework, leveraging reinforcement learning to enhance the factuality and honesty of LLMs. Our experiments across multiple models show that RLKF training effectively enhances the ability of models to utilize their internal knowledge state, boosting performance in a variety of knowledge-based and honesty-related tasks.</abstract><doi>10.48550/arxiv.2401.15449</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2401.15449
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2401_15449
source	arXiv.org
subjects	Computer Science - Computation and Language
title	Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T08%3A36%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20to%20Trust%20Your%20Feelings:%20Leveraging%20Self-awareness%20in%20LLMs%20for%20Hallucination%20Mitigation&rft.au=Liang,%20Yuxin&rft.date=2024-01-27&rft_id=info:doi/10.48550/arxiv.2401.15449&rft_dat=%3Carxiv_GOX%3E2401_15449%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true