Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation

We evaluate the ability of Large Language Models (LLMs) to discern and express their internal knowledge state, a key factor in countering factual hallucination and ensuring reliable application of LLMs. We observe a robust self-awareness of internal knowledge state in LLMs, evidenced by over 85% acc...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Liang, Yuxin, Song, Zhuoyang, Wang, Hao, Zhang, Jiaxing
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Liang, Yuxin
Song, Zhuoyang
Wang, Hao
Zhang, Jiaxing
description We evaluate the ability of Large Language Models (LLMs) to discern and express their internal knowledge state, a key factor in countering factual hallucination and ensuring reliable application of LLMs. We observe a robust self-awareness of internal knowledge state in LLMs, evidenced by over 85% accuracy in knowledge probing. However, LLMs often fail to express their internal knowledge during generation, leading to factual hallucinations. We develop an automated hallucination annotation tool, Dreamcatcher, which merges knowledge probing and consistency checking methods to rank factual preference data. Using knowledge preference as reward, We propose a Reinforcement Learning from Knowledge Feedback (RLKF) training framework, leveraging reinforcement learning to enhance the factuality and honesty of LLMs. Our experiments across multiple models show that RLKF training effectively enhances the ability of models to utilize their internal knowledge state, boosting performance in a variety of knowledge-based and honesty-related tasks.
doi_str_mv 10.48550/arxiv.2401.15449
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2401_15449</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2401_15449</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-d9901fe079ae10208df24db49051370b9e8c6a6dd92d1d4fbf5dccb097b8d3243</originalsourceid><addsrcrecordid>eNotj71OwzAUhb0woMIDMOEXSLAd58dsqKIUyRUDWTpFN_F1ZMl1kJ0UeHtoYDpHR5-O9BFyx1kum7JkDxC_3DkXkvGcl1Kqa9JphBhcGOk80TYuaabHaYl0h-h_1_RINZ4xwnhB3tHbDD4hYsCUqAtU60Oidop0D94vgwswuynQg5vduNYbcmXBJ7z9zw1pd8_tdp_pt5fX7ZPOoKpVZpRi3CKrFSBngjXGCml6qVjJi5r1CpuhgsoYJQw30va2NMPQM1X3jSmELDbk_u92New-ojtB_O4upt1qWvwAakJPTg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation</title><source>arXiv.org</source><creator>Liang, Yuxin ; Song, Zhuoyang ; Wang, Hao ; Zhang, Jiaxing</creator><creatorcontrib>Liang, Yuxin ; Song, Zhuoyang ; Wang, Hao ; Zhang, Jiaxing</creatorcontrib><description>We evaluate the ability of Large Language Models (LLMs) to discern and express their internal knowledge state, a key factor in countering factual hallucination and ensuring reliable application of LLMs. We observe a robust self-awareness of internal knowledge state in LLMs, evidenced by over 85% accuracy in knowledge probing. However, LLMs often fail to express their internal knowledge during generation, leading to factual hallucinations. We develop an automated hallucination annotation tool, Dreamcatcher, which merges knowledge probing and consistency checking methods to rank factual preference data. Using knowledge preference as reward, We propose a Reinforcement Learning from Knowledge Feedback (RLKF) training framework, leveraging reinforcement learning to enhance the factuality and honesty of LLMs. Our experiments across multiple models show that RLKF training effectively enhances the ability of models to utilize their internal knowledge state, boosting performance in a variety of knowledge-based and honesty-related tasks.</description><identifier>DOI: 10.48550/arxiv.2401.15449</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2024-01</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2401.15449$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2401.15449$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Liang, Yuxin</creatorcontrib><creatorcontrib>Song, Zhuoyang</creatorcontrib><creatorcontrib>Wang, Hao</creatorcontrib><creatorcontrib>Zhang, Jiaxing</creatorcontrib><title>Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation</title><description>We evaluate the ability of Large Language Models (LLMs) to discern and express their internal knowledge state, a key factor in countering factual hallucination and ensuring reliable application of LLMs. We observe a robust self-awareness of internal knowledge state in LLMs, evidenced by over 85% accuracy in knowledge probing. However, LLMs often fail to express their internal knowledge during generation, leading to factual hallucinations. We develop an automated hallucination annotation tool, Dreamcatcher, which merges knowledge probing and consistency checking methods to rank factual preference data. Using knowledge preference as reward, We propose a Reinforcement Learning from Knowledge Feedback (RLKF) training framework, leveraging reinforcement learning to enhance the factuality and honesty of LLMs. Our experiments across multiple models show that RLKF training effectively enhances the ability of models to utilize their internal knowledge state, boosting performance in a variety of knowledge-based and honesty-related tasks.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj71OwzAUhb0woMIDMOEXSLAd58dsqKIUyRUDWTpFN_F1ZMl1kJ0UeHtoYDpHR5-O9BFyx1kum7JkDxC_3DkXkvGcl1Kqa9JphBhcGOk80TYuaabHaYl0h-h_1_RINZ4xwnhB3tHbDD4hYsCUqAtU60Oidop0D94vgwswuynQg5vduNYbcmXBJ7z9zw1pd8_tdp_pt5fX7ZPOoKpVZpRi3CKrFSBngjXGCml6qVjJi5r1CpuhgsoYJQw30va2NMPQM1X3jSmELDbk_u92New-ojtB_O4upt1qWvwAakJPTg</recordid><startdate>20240127</startdate><enddate>20240127</enddate><creator>Liang, Yuxin</creator><creator>Song, Zhuoyang</creator><creator>Wang, Hao</creator><creator>Zhang, Jiaxing</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240127</creationdate><title>Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation</title><author>Liang, Yuxin ; Song, Zhuoyang ; Wang, Hao ; Zhang, Jiaxing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-d9901fe079ae10208df24db49051370b9e8c6a6dd92d1d4fbf5dccb097b8d3243</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Liang, Yuxin</creatorcontrib><creatorcontrib>Song, Zhuoyang</creatorcontrib><creatorcontrib>Wang, Hao</creatorcontrib><creatorcontrib>Zhang, Jiaxing</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liang, Yuxin</au><au>Song, Zhuoyang</au><au>Wang, Hao</au><au>Zhang, Jiaxing</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation</atitle><date>2024-01-27</date><risdate>2024</risdate><abstract>We evaluate the ability of Large Language Models (LLMs) to discern and express their internal knowledge state, a key factor in countering factual hallucination and ensuring reliable application of LLMs. We observe a robust self-awareness of internal knowledge state in LLMs, evidenced by over 85% accuracy in knowledge probing. However, LLMs often fail to express their internal knowledge during generation, leading to factual hallucinations. We develop an automated hallucination annotation tool, Dreamcatcher, which merges knowledge probing and consistency checking methods to rank factual preference data. Using knowledge preference as reward, We propose a Reinforcement Learning from Knowledge Feedback (RLKF) training framework, leveraging reinforcement learning to enhance the factuality and honesty of LLMs. Our experiments across multiple models show that RLKF training effectively enhances the ability of models to utilize their internal knowledge state, boosting performance in a variety of knowledge-based and honesty-related tasks.</abstract><doi>10.48550/arxiv.2401.15449</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2401.15449
ispartof
issn
language eng
recordid cdi_arxiv_primary_2401_15449
source arXiv.org
subjects Computer Science - Computation and Language
title Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T08%3A36%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20to%20Trust%20Your%20Feelings:%20Leveraging%20Self-awareness%20in%20LLMs%20for%20Hallucination%20Mitigation&rft.au=Liang,%20Yuxin&rft.date=2024-01-27&rft_id=info:doi/10.48550/arxiv.2401.15449&rft_dat=%3Carxiv_GOX%3E2401_15449%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true