Banishing LLM Hallucinations Requires Rethinking Generalization

Despite their powerful chat, coding, and reasoning abilities, Large Language Models (LLMs) frequently hallucinate. Conventional wisdom suggests that hallucinations are a consequence of a balance between creativity and factuality, which can be mitigated, but not eliminated, by grounding the LLM in ex...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-06
Hauptverfasser: Li, Johnny, Consul, Saksham, Zhou, Eda, Wong, James, Farooqui, Naila, Ye, Yuxin, Nithyashree Manohar, Zhuxiaona Wei, Wu, Tian, Echols, Ben, Zhou, Sharon, Diamos, Gregory
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Li, Johnny
Consul, Saksham
Zhou, Eda
Wong, James
Farooqui, Naila
Ye, Yuxin
Nithyashree Manohar
Zhuxiaona Wei
Wu, Tian
Echols, Ben
Zhou, Sharon
Diamos, Gregory
description Despite their powerful chat, coding, and reasoning abilities, Large Language Models (LLMs) frequently hallucinate. Conventional wisdom suggests that hallucinations are a consequence of a balance between creativity and factuality, which can be mitigated, but not eliminated, by grounding the LLM in external knowledge sources. Through extensive systematic experiments, we show that these traditional approaches fail to explain why LLMs hallucinate in practice. Specifically, we show that LLMs augmented with a massive Mixture of Memory Experts (MoME) can easily memorize large datasets of random numbers. We corroborate these experimental findings with a theoretical construction showing that simple neural networks trained to predict the next token hallucinate when the training loss is above a threshold as it usually does in practice when training on internet scale data. We interpret our findings by comparing against traditional retrieval methods for mitigating hallucinations. We use our findings to design a first generation model for removing hallucinations -- Lamini-1 -- that stores facts in a massive mixture of millions of memory experts that are retrieved dynamically.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3072359605</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3072359605</sourcerecordid><originalsourceid>FETCH-proquest_journals_30723596053</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSwd0rMyyzOyMxLV_Dx8VXwSMzJKU3OzEssyczPK1YISi0szSxKBTFKgGqyQcrcU_NSixJzMqvAangYWNMSc4pTeaE0N4Oym2uIs4duQVF-YWlqcUl8Vn5pUR5QKt7YwNzI2NTSzMDUmDhVABTFOQk</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3072359605</pqid></control><display><type>article</type><title>Banishing LLM Hallucinations Requires Rethinking Generalization</title><source>Free E- Journals</source><creator>Li, Johnny ; Consul, Saksham ; Zhou, Eda ; Wong, James ; Farooqui, Naila ; Ye, Yuxin ; Nithyashree Manohar ; Zhuxiaona Wei ; Wu, Tian ; Echols, Ben ; Zhou, Sharon ; Diamos, Gregory</creator><creatorcontrib>Li, Johnny ; Consul, Saksham ; Zhou, Eda ; Wong, James ; Farooqui, Naila ; Ye, Yuxin ; Nithyashree Manohar ; Zhuxiaona Wei ; Wu, Tian ; Echols, Ben ; Zhou, Sharon ; Diamos, Gregory</creatorcontrib><description>Despite their powerful chat, coding, and reasoning abilities, Large Language Models (LLMs) frequently hallucinate. Conventional wisdom suggests that hallucinations are a consequence of a balance between creativity and factuality, which can be mitigated, but not eliminated, by grounding the LLM in external knowledge sources. Through extensive systematic experiments, we show that these traditional approaches fail to explain why LLMs hallucinate in practice. Specifically, we show that LLMs augmented with a massive Mixture of Memory Experts (MoME) can easily memorize large datasets of random numbers. We corroborate these experimental findings with a theoretical construction showing that simple neural networks trained to predict the next token hallucinate when the training loss is above a threshold as it usually does in practice when training on internet scale data. We interpret our findings by comparing against traditional retrieval methods for mitigating hallucinations. We use our findings to design a first generation model for removing hallucinations -- Lamini-1 -- that stores facts in a massive mixture of millions of memory experts that are retrieved dynamically.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Hallucinations ; Large language models ; Mixtures ; Neural networks ; Random numbers</subject><ispartof>arXiv.org, 2024-06</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Li, Johnny</creatorcontrib><creatorcontrib>Consul, Saksham</creatorcontrib><creatorcontrib>Zhou, Eda</creatorcontrib><creatorcontrib>Wong, James</creatorcontrib><creatorcontrib>Farooqui, Naila</creatorcontrib><creatorcontrib>Ye, Yuxin</creatorcontrib><creatorcontrib>Nithyashree Manohar</creatorcontrib><creatorcontrib>Zhuxiaona Wei</creatorcontrib><creatorcontrib>Wu, Tian</creatorcontrib><creatorcontrib>Echols, Ben</creatorcontrib><creatorcontrib>Zhou, Sharon</creatorcontrib><creatorcontrib>Diamos, Gregory</creatorcontrib><title>Banishing LLM Hallucinations Requires Rethinking Generalization</title><title>arXiv.org</title><description>Despite their powerful chat, coding, and reasoning abilities, Large Language Models (LLMs) frequently hallucinate. Conventional wisdom suggests that hallucinations are a consequence of a balance between creativity and factuality, which can be mitigated, but not eliminated, by grounding the LLM in external knowledge sources. Through extensive systematic experiments, we show that these traditional approaches fail to explain why LLMs hallucinate in practice. Specifically, we show that LLMs augmented with a massive Mixture of Memory Experts (MoME) can easily memorize large datasets of random numbers. We corroborate these experimental findings with a theoretical construction showing that simple neural networks trained to predict the next token hallucinate when the training loss is above a threshold as it usually does in practice when training on internet scale data. We interpret our findings by comparing against traditional retrieval methods for mitigating hallucinations. We use our findings to design a first generation model for removing hallucinations -- Lamini-1 -- that stores facts in a massive mixture of millions of memory experts that are retrieved dynamically.</description><subject>Hallucinations</subject><subject>Large language models</subject><subject>Mixtures</subject><subject>Neural networks</subject><subject>Random numbers</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSwd0rMyyzOyMxLV_Dx8VXwSMzJKU3OzEssyczPK1YISi0szSxKBTFKgGqyQcrcU_NSixJzMqvAangYWNMSc4pTeaE0N4Oym2uIs4duQVF-YWlqcUl8Vn5pUR5QKt7YwNzI2NTSzMDUmDhVABTFOQk</recordid><startdate>20240625</startdate><enddate>20240625</enddate><creator>Li, Johnny</creator><creator>Consul, Saksham</creator><creator>Zhou, Eda</creator><creator>Wong, James</creator><creator>Farooqui, Naila</creator><creator>Ye, Yuxin</creator><creator>Nithyashree Manohar</creator><creator>Zhuxiaona Wei</creator><creator>Wu, Tian</creator><creator>Echols, Ben</creator><creator>Zhou, Sharon</creator><creator>Diamos, Gregory</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240625</creationdate><title>Banishing LLM Hallucinations Requires Rethinking Generalization</title><author>Li, Johnny ; Consul, Saksham ; Zhou, Eda ; Wong, James ; Farooqui, Naila ; Ye, Yuxin ; Nithyashree Manohar ; Zhuxiaona Wei ; Wu, Tian ; Echols, Ben ; Zhou, Sharon ; Diamos, Gregory</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30723596053</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Hallucinations</topic><topic>Large language models</topic><topic>Mixtures</topic><topic>Neural networks</topic><topic>Random numbers</topic><toplevel>online_resources</toplevel><creatorcontrib>Li, Johnny</creatorcontrib><creatorcontrib>Consul, Saksham</creatorcontrib><creatorcontrib>Zhou, Eda</creatorcontrib><creatorcontrib>Wong, James</creatorcontrib><creatorcontrib>Farooqui, Naila</creatorcontrib><creatorcontrib>Ye, Yuxin</creatorcontrib><creatorcontrib>Nithyashree Manohar</creatorcontrib><creatorcontrib>Zhuxiaona Wei</creatorcontrib><creatorcontrib>Wu, Tian</creatorcontrib><creatorcontrib>Echols, Ben</creatorcontrib><creatorcontrib>Zhou, Sharon</creatorcontrib><creatorcontrib>Diamos, Gregory</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Johnny</au><au>Consul, Saksham</au><au>Zhou, Eda</au><au>Wong, James</au><au>Farooqui, Naila</au><au>Ye, Yuxin</au><au>Nithyashree Manohar</au><au>Zhuxiaona Wei</au><au>Wu, Tian</au><au>Echols, Ben</au><au>Zhou, Sharon</au><au>Diamos, Gregory</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Banishing LLM Hallucinations Requires Rethinking Generalization</atitle><jtitle>arXiv.org</jtitle><date>2024-06-25</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Despite their powerful chat, coding, and reasoning abilities, Large Language Models (LLMs) frequently hallucinate. Conventional wisdom suggests that hallucinations are a consequence of a balance between creativity and factuality, which can be mitigated, but not eliminated, by grounding the LLM in external knowledge sources. Through extensive systematic experiments, we show that these traditional approaches fail to explain why LLMs hallucinate in practice. Specifically, we show that LLMs augmented with a massive Mixture of Memory Experts (MoME) can easily memorize large datasets of random numbers. We corroborate these experimental findings with a theoretical construction showing that simple neural networks trained to predict the next token hallucinate when the training loss is above a threshold as it usually does in practice when training on internet scale data. We interpret our findings by comparing against traditional retrieval methods for mitigating hallucinations. We use our findings to design a first generation model for removing hallucinations -- Lamini-1 -- that stores facts in a massive mixture of millions of memory experts that are retrieved dynamically.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-06
issn 2331-8422
language eng
recordid cdi_proquest_journals_3072359605
source Free E- Journals
subjects Hallucinations
Large language models
Mixtures
Neural networks
Random numbers
title Banishing LLM Hallucinations Requires Rethinking Generalization
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T07%3A14%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Banishing%20LLM%20Hallucinations%20Requires%20Rethinking%20Generalization&rft.jtitle=arXiv.org&rft.au=Li,%20Johnny&rft.date=2024-06-25&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3072359605%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3072359605&rft_id=info:pmid/&rfr_iscdi=true