Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models

Memory is one of the most essential cognitive functions serving as a repository of world knowledge and episodes of activities. In recent years, large-scale pre-trained language models have shown remarkable memorizing ability. On the contrary, vanilla neural networks without pre-training have been lo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-03
Hauptverfasser: Cao, Boxi, Tang, Qiaoyu, Lin, Hongyu, Jiang, Shanshan, Dong, Bin, Han, Xianpei, Chen, Jiawei, Wang, Tianshu, Sun, Le
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Cao, Boxi
Tang, Qiaoyu
Lin, Hongyu
Jiang, Shanshan
Dong, Bin
Han, Xianpei
Chen, Jiawei
Wang, Tianshu
Sun, Le
description Memory is one of the most essential cognitive functions serving as a repository of world knowledge and episodes of activities. In recent years, large-scale pre-trained language models have shown remarkable memorizing ability. On the contrary, vanilla neural networks without pre-training have been long observed suffering from the catastrophic forgetting problem. To investigate such a retentive-forgetful contradiction and understand the memory mechanism of language models, we conduct thorough experiments by controlling the target knowledge types, the learning strategies and the learning schedules. We find that: 1) Vanilla language models are forgetful; 2) Pre-training leads to retentive language models; 3) Knowledge relevance and diversification significantly influence the memory formation. These conclusions are useful for understanding the abilities of pre-trained language models and shed light on designing and evaluating new learning and inference algorithms of language models.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2814621725</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2814621725</sourcerecordid><originalsourceid>FETCH-proquest_journals_28146217253</originalsourceid><addsrcrecordid>eNqNjssKgkAUQIcgSMp_uNBa0PGRuxaVBOUm2rQSyauO6Nyah0Ffn0Ef0OoszlmcGXN4GAZeGnG-YK7Wne_7PNnwOA4ddrugQWnEiEAKMlINmtr2W9iLUcgGhDQEpkU4SXr1WDUIOQ6kxPtrc7y3pRR6AKrhXMrGlt-AKuz1is3rstfo_rhk6-xw3R29h6KnRW2KjqySkyp4GkQJD6al8L_qA9RpQh0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2814621725</pqid></control><display><type>article</type><title>Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models</title><source>Free E- Journals</source><creator>Cao, Boxi ; Tang, Qiaoyu ; Lin, Hongyu ; Jiang, Shanshan ; Dong, Bin ; Han, Xianpei ; Chen, Jiawei ; Wang, Tianshu ; Sun, Le</creator><creatorcontrib>Cao, Boxi ; Tang, Qiaoyu ; Lin, Hongyu ; Jiang, Shanshan ; Dong, Bin ; Han, Xianpei ; Chen, Jiawei ; Wang, Tianshu ; Sun, Le</creatorcontrib><description>Memory is one of the most essential cognitive functions serving as a repository of world knowledge and episodes of activities. In recent years, large-scale pre-trained language models have shown remarkable memorizing ability. On the contrary, vanilla neural networks without pre-training have been long observed suffering from the catastrophic forgetting problem. To investigate such a retentive-forgetful contradiction and understand the memory mechanism of language models, we conduct thorough experiments by controlling the target knowledge types, the learning strategies and the learning schedules. We find that: 1) Vanilla language models are forgetful; 2) Pre-training leads to retentive language models; 3) Knowledge relevance and diversification significantly influence the memory formation. These conclusions are useful for understanding the abilities of pre-trained language models and shed light on designing and evaluating new learning and inference algorithms of language models.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Machine learning ; Neural networks ; Training</subject><ispartof>arXiv.org, 2024-03</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Cao, Boxi</creatorcontrib><creatorcontrib>Tang, Qiaoyu</creatorcontrib><creatorcontrib>Lin, Hongyu</creatorcontrib><creatorcontrib>Jiang, Shanshan</creatorcontrib><creatorcontrib>Dong, Bin</creatorcontrib><creatorcontrib>Han, Xianpei</creatorcontrib><creatorcontrib>Chen, Jiawei</creatorcontrib><creatorcontrib>Wang, Tianshu</creatorcontrib><creatorcontrib>Sun, Le</creatorcontrib><title>Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models</title><title>arXiv.org</title><description>Memory is one of the most essential cognitive functions serving as a repository of world knowledge and episodes of activities. In recent years, large-scale pre-trained language models have shown remarkable memorizing ability. On the contrary, vanilla neural networks without pre-training have been long observed suffering from the catastrophic forgetting problem. To investigate such a retentive-forgetful contradiction and understand the memory mechanism of language models, we conduct thorough experiments by controlling the target knowledge types, the learning strategies and the learning schedules. We find that: 1) Vanilla language models are forgetful; 2) Pre-training leads to retentive language models; 3) Knowledge relevance and diversification significantly influence the memory formation. These conclusions are useful for understanding the abilities of pre-trained language models and shed light on designing and evaluating new learning and inference algorithms of language models.</description><subject>Algorithms</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>Training</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNjssKgkAUQIcgSMp_uNBa0PGRuxaVBOUm2rQSyauO6Nyah0Ffn0Ef0OoszlmcGXN4GAZeGnG-YK7Wne_7PNnwOA4ddrugQWnEiEAKMlINmtr2W9iLUcgGhDQEpkU4SXr1WDUIOQ6kxPtrc7y3pRR6AKrhXMrGlt-AKuz1is3rstfo_rhk6-xw3R29h6KnRW2KjqySkyp4GkQJD6al8L_qA9RpQh0</recordid><startdate>20240313</startdate><enddate>20240313</enddate><creator>Cao, Boxi</creator><creator>Tang, Qiaoyu</creator><creator>Lin, Hongyu</creator><creator>Jiang, Shanshan</creator><creator>Dong, Bin</creator><creator>Han, Xianpei</creator><creator>Chen, Jiawei</creator><creator>Wang, Tianshu</creator><creator>Sun, Le</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240313</creationdate><title>Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models</title><author>Cao, Boxi ; Tang, Qiaoyu ; Lin, Hongyu ; Jiang, Shanshan ; Dong, Bin ; Han, Xianpei ; Chen, Jiawei ; Wang, Tianshu ; Sun, Le</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28146217253</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Cao, Boxi</creatorcontrib><creatorcontrib>Tang, Qiaoyu</creatorcontrib><creatorcontrib>Lin, Hongyu</creatorcontrib><creatorcontrib>Jiang, Shanshan</creatorcontrib><creatorcontrib>Dong, Bin</creatorcontrib><creatorcontrib>Han, Xianpei</creatorcontrib><creatorcontrib>Chen, Jiawei</creatorcontrib><creatorcontrib>Wang, Tianshu</creatorcontrib><creatorcontrib>Sun, Le</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cao, Boxi</au><au>Tang, Qiaoyu</au><au>Lin, Hongyu</au><au>Jiang, Shanshan</au><au>Dong, Bin</au><au>Han, Xianpei</au><au>Chen, Jiawei</au><au>Wang, Tianshu</au><au>Sun, Le</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models</atitle><jtitle>arXiv.org</jtitle><date>2024-03-13</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Memory is one of the most essential cognitive functions serving as a repository of world knowledge and episodes of activities. In recent years, large-scale pre-trained language models have shown remarkable memorizing ability. On the contrary, vanilla neural networks without pre-training have been long observed suffering from the catastrophic forgetting problem. To investigate such a retentive-forgetful contradiction and understand the memory mechanism of language models, we conduct thorough experiments by controlling the target knowledge types, the learning strategies and the learning schedules. We find that: 1) Vanilla language models are forgetful; 2) Pre-training leads to retentive language models; 3) Knowledge relevance and diversification significantly influence the memory formation. These conclusions are useful for understanding the abilities of pre-trained language models and shed light on designing and evaluating new learning and inference algorithms of language models.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-03
issn 2331-8422
language eng
recordid cdi_proquest_journals_2814621725
source Free E- Journals
subjects Algorithms
Machine learning
Neural networks
Training
title Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T17%3A41%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Retentive%20or%20Forgetful?%20Diving%20into%20the%20Knowledge%20Memorizing%20Mechanism%20of%20Language%20Models&rft.jtitle=arXiv.org&rft.au=Cao,%20Boxi&rft.date=2024-03-13&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2814621725%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2814621725&rft_id=info:pmid/&rfr_iscdi=true