Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models

Memory is one of the most essential cognitive functions serving as a repository of world knowledge and episodes of activities. In recent years, large-scale pre-trained language models have shown remarkable memorizing ability. On the contrary, vanilla neural networks without pre-training have been lo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-03
Hauptverfasser:	Cao, Boxi, Tang, Qiaoyu, Lin, Hongyu, Jiang, Shanshan, Dong, Bin, Han, Xianpei, Chen, Jiawei, Wang, Tianshu, Sun, Le
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Machine learning Neural networks Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Cao, Boxi Tang, Qiaoyu Lin, Hongyu Jiang, Shanshan Dong, Bin Han, Xianpei Chen, Jiawei Wang, Tianshu Sun, Le
description	Memory is one of the most essential cognitive functions serving as a repository of world knowledge and episodes of activities. In recent years, large-scale pre-trained language models have shown remarkable memorizing ability. On the contrary, vanilla neural networks without pre-training have been long observed suffering from the catastrophic forgetting problem. To investigate such a retentive-forgetful contradiction and understand the memory mechanism of language models, we conduct thorough experiments by controlling the target knowledge types, the learning strategies and the learning schedules. We find that: 1) Vanilla language models are forgetful; 2) Pre-training leads to retentive language models; 3) Knowledge relevance and diversification significantly influence the memory formation. These conclusions are useful for understanding the abilities of pre-trained language models and shed light on designing and evaluating new learning and inference algorithms of language models.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2814621725</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2814621725</sourcerecordid><originalsourceid>FETCH-proquest_journals_28146217253</originalsourceid><addsrcrecordid>eNqNjssKgkAUQIcgSMp_uNBa0PGRuxaVBOUm2rQSyauO6Nyah0Ffn0Ef0OoszlmcGXN4GAZeGnG-YK7Wne_7PNnwOA4ddrugQWnEiEAKMlINmtr2W9iLUcgGhDQEpkU4SXr1WDUIOQ6kxPtrc7y3pRR6AKrhXMrGlt-AKuz1is3rstfo_rhk6-xw3R29h6KnRW2KjqySkyp4GkQJD6al8L_qA9RpQh0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2814621725</pqid></control><display><type>article</type><title>Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models</title><source>Free E- Journals</source><creator>Cao, Boxi ; Tang, Qiaoyu ; Lin, Hongyu ; Jiang, Shanshan ; Dong, Bin ; Han, Xianpei ; Chen, Jiawei ; Wang, Tianshu ; Sun, Le</creator><creatorcontrib>Cao, Boxi ; Tang, Qiaoyu ; Lin, Hongyu ; Jiang, Shanshan ; Dong, Bin ; Han, Xianpei ; Chen, Jiawei ; Wang, Tianshu ; Sun, Le</creatorcontrib><description>Memory is one of the most essential cognitive functions serving as a repository of world knowledge and episodes of activities. In recent years, large-scale pre-trained language models have shown remarkable memorizing ability. On the contrary, vanilla neural networks without pre-training have been long observed suffering from the catastrophic forgetting problem. To investigate such a retentive-forgetful contradiction and understand the memory mechanism of language models, we conduct thorough experiments by controlling the target knowledge types, the learning strategies and the learning schedules. We find that: 1) Vanilla language models are forgetful; 2) Pre-training leads to retentive language models; 3) Knowledge relevance and diversification significantly influence the memory formation. These conclusions are useful for understanding the abilities of pre-trained language models and shed light on designing and evaluating new learning and inference algorithms of language models.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Machine learning ; Neural networks ; Training</subject><ispartof>arXiv.org, 2024-03</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Cao, Boxi</creatorcontrib><creatorcontrib>Tang, Qiaoyu</creatorcontrib><creatorcontrib>Lin, Hongyu</creatorcontrib><creatorcontrib>Jiang, Shanshan</creatorcontrib><creatorcontrib>Dong, Bin</creatorcontrib><creatorcontrib>Han, Xianpei</creatorcontrib><creatorcontrib>Chen, Jiawei</creatorcontrib><creatorcontrib>Wang, Tianshu</creatorcontrib><creatorcontrib>Sun, Le</creatorcontrib><title>Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models</title><title>arXiv.org</title><description>Memory is one of the most essential cognitive functions serving as a repository of world knowledge and episodes of activities. In recent years, large-scale pre-trained language models have shown remarkable memorizing ability. On the contrary, vanilla neural networks without pre-training have been long observed suffering from the catastrophic forgetting problem. To investigate such a retentive-forgetful contradiction and understand the memory mechanism of language models, we conduct thorough experiments by controlling the target knowledge types, the learning strategies and the learning schedules. We find that: 1) Vanilla language models are forgetful; 2) Pre-training leads to retentive language models; 3) Knowledge relevance and diversification significantly influence the memory formation. These conclusions are useful for understanding the abilities of pre-trained language models and shed light on designing and evaluating new learning and inference algorithms of language models.</description><subject>Algorithms</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>Training</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNjssKgkAUQIcgSMp_uNBa0PGRuxaVBOUm2rQSyauO6Nyah0Ffn0Ef0OoszlmcGXN4GAZeGnG-YK7Wne_7PNnwOA4ddrugQWnEiEAKMlINmtr2W9iLUcgGhDQEpkU4SXr1WDUIOQ6kxPtrc7y3pRR6AKrhXMrGlt-AKuz1is3rstfo_rhk6-xw3R29h6KnRW2KjqySkyp4GkQJD6al8L_qA9RpQh0</recordid><startdate>20240313</startdate><enddate>20240313</enddate><creator>Cao, Boxi</creator><creator>Tang, Qiaoyu</creator><creator>Lin, Hongyu</creator><creator>Jiang, Shanshan</creator><creator>Dong, Bin</creator><creator>Han, Xianpei</creator><creator>Chen, Jiawei</creator><creator>Wang, Tianshu</creator><creator>Sun, Le</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240313</creationdate><title>Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models</title><author>Cao, Boxi ; Tang, Qiaoyu ; Lin, Hongyu ; Jiang, Shanshan ; Dong, Bin ; Han, Xianpei ; Chen, Jiawei ; Wang, Tianshu ; Sun, Le</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28146217253</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Cao, Boxi</creatorcontrib><creatorcontrib>Tang, Qiaoyu</creatorcontrib><creatorcontrib>Lin, Hongyu</creatorcontrib><creatorcontrib>Jiang, Shanshan</creatorcontrib><creatorcontrib>Dong, Bin</creatorcontrib><creatorcontrib>Han, Xianpei</creatorcontrib><creatorcontrib>Chen, Jiawei</creatorcontrib><creatorcontrib>Wang, Tianshu</creatorcontrib><creatorcontrib>Sun, Le</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cao, Boxi</au><au>Tang, Qiaoyu</au><au>Lin, Hongyu</au><au>Jiang, Shanshan</au><au>Dong, Bin</au><au>Han, Xianpei</au><au>Chen, Jiawei</au><au>Wang, Tianshu</au><au>Sun, Le</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models</atitle><jtitle>arXiv.org</jtitle><date>2024-03-13</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Memory is one of the most essential cognitive functions serving as a repository of world knowledge and episodes of activities. In recent years, large-scale pre-trained language models have shown remarkable memorizing ability. On the contrary, vanilla neural networks without pre-training have been long observed suffering from the catastrophic forgetting problem. To investigate such a retentive-forgetful contradiction and understand the memory mechanism of language models, we conduct thorough experiments by controlling the target knowledge types, the learning strategies and the learning schedules. We find that: 1) Vanilla language models are forgetful; 2) Pre-training leads to retentive language models; 3) Knowledge relevance and diversification significantly influence the memory formation. These conclusions are useful for understanding the abilities of pre-trained language models and shed light on designing and evaluating new learning and inference algorithms of language models.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-03
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2814621725
source	Free E- Journals
subjects	Algorithms Machine learning Neural networks Training
title	Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T17%3A41%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Retentive%20or%20Forgetful?%20Diving%20into%20the%20Knowledge%20Memorizing%20Mechanism%20of%20Language%20Models&rft.jtitle=arXiv.org&rft.au=Cao,%20Boxi&rft.date=2024-03-13&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2814621725%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2814621725&rft_id=info:pmid/&rfr_iscdi=true