Exact learning dynamics of deep linear networks with prior knowledge

Learning in deep neural networks is known to depend critically on the knowledge embedded in the initial network weights. However, few theoretical results have precisely linked prior knowledge to learning dynamics. Here we derive exact solutions to the dynamics of learning with rich prior knowledge i...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of statistical mechanics 2023-11, Vol.2023 (11), p.114004-114004
Hauptverfasser:	J Dominé, Clémentine C, Braun, Lukas, Fitzgerald, James E, Saxe, Andrew M
Format:	Artikel
Sprache:	eng
Schlagworte:	deep learning learning theory machine learning Machine Learning 2023
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	114004
container_issue	11
container_start_page	114004
container_title	Journal of statistical mechanics
container_volume	2023
creator	J Dominé, Clémentine C Braun, Lukas Fitzgerald, James E Saxe, Andrew M
description	Learning in deep neural networks is known to depend critically on the knowledge embedded in the initial network weights. However, few theoretical results have precisely linked prior knowledge to learning dynamics. Here we derive exact solutions to the dynamics of learning with rich prior knowledge in deep linear networks by generalising Fukumizu's matrix Riccati solution (Fukumizu 1998 1E-03). We obtain explicit expressions for the evolving network function, hidden representational similarity, and neural tangent kernel over training for a broad class of initialisations and tasks. The expressions reveal a class of task-independent initialisations that radically alter learning dynamics from slow non-linear dynamics to fast exponential trajectories while converging to a global optimum with identical representational similarity, dissociating learning trajectories from the structure of initial internal representations. We characterise how network weights dynamically align with task structure, rigorously justifying why previous solutions successfully described learning from small initial weights without incorporating their fine-scale structure. Finally, we discuss the implications of these findings for continual learning, reversal learning and learning of structured knowledge. Taken together, our results provide a mathematical toolkit for understanding the impact of prior knowledge on deep learning.
doi_str_mv	10.1088/1742-5468/ad01b8
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2985795000</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2985795000</sourcerecordid><originalsourceid>FETCH-LOGICAL-c387t-e9898140baa184d44d8be75658f4f5af6a0599e7a88bfa209420dc9a906a5fad3</originalsourceid><addsrcrecordid>eNp1kL1PwzAQxS0EolDYmZBHBkqdxE7sCaFSPqRKLDBbl9hp3SZ2sFNK_3tStVRlYLrTvXfvTj-EriJyFxHOh1FG4wGjKR-CIlHOj9DZfnR80PfQeQhzQpKYUH6KeglnMY1ZcoYex99QtLjS4K2xU6zWFmpTBOxKrLRucGVsp2Gr25Xzi4BXpp3hxhvn8cK6VaXVVF-gkxKqoC93tY8-nsbvo5fB5O35dfQwGRQJz9qBFlzwiJIcIOJUUap4rjOWMl7SkkGZAmFC6Aw4z0uIiaAxUYUAQVJgJaikj-63uc0yr7UqtG09VLL7pga_lg6M_KtYM5NT9yUjIhhLSdIl3OwSvPtc6tDK2oRCVxVY7ZZBxoKzTDDSoeojsrUW3oXgdbm_ExG5oS83eOUGr9zS71auD__bL_zi7gy3W4NxjZy7pbcdrv_zfgDYlI-L</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2985795000</pqid></control><display><type>article</type><title>Exact learning dynamics of deep linear networks with prior knowledge</title><source>HEAL-Link subscriptions: Institute of Physics (IOP) Journals</source><source>Institute of Physics Journals</source><creator>J Dominé, Clémentine C ; Braun, Lukas ; Fitzgerald, James E ; Saxe, Andrew M</creator><creatorcontrib>J Dominé, Clémentine C ; Braun, Lukas ; Fitzgerald, James E ; Saxe, Andrew M</creatorcontrib><description>Learning in deep neural networks is known to depend critically on the knowledge embedded in the initial network weights. However, few theoretical results have precisely linked prior knowledge to learning dynamics. Here we derive exact solutions to the dynamics of learning with rich prior knowledge in deep linear networks by generalising Fukumizu's matrix Riccati solution (Fukumizu 1998 1E-03). We obtain explicit expressions for the evolving network function, hidden representational similarity, and neural tangent kernel over training for a broad class of initialisations and tasks. The expressions reveal a class of task-independent initialisations that radically alter learning dynamics from slow non-linear dynamics to fast exponential trajectories while converging to a global optimum with identical representational similarity, dissociating learning trajectories from the structure of initial internal representations. We characterise how network weights dynamically align with task structure, rigorously justifying why previous solutions successfully described learning from small initial weights without incorporating their fine-scale structure. Finally, we discuss the implications of these findings for continual learning, reversal learning and learning of structured knowledge. Taken together, our results provide a mathematical toolkit for understanding the impact of prior knowledge on deep learning.</description><identifier>ISSN: 1742-5468</identifier><identifier>EISSN: 1742-5468</identifier><identifier>DOI: 10.1088/1742-5468/ad01b8</identifier><identifier>PMID: 38524253</identifier><language>eng</language><publisher>England: IOP Publishing</publisher><subject>deep learning ; learning theory ; machine learning ; Machine Learning 2023</subject><ispartof>Journal of statistical mechanics, 2023-11, Vol.2023 (11), p.114004-114004</ispartof><rights>2023 The Author(s). Published on behalf of SISSA Medialab srl by IOP Publishing Ltd</rights><rights>2023 The Author(s). Published on behalf of SISSA Medialab srl by IOP Publishing Ltd.</rights><rights>2023 The Author(s). Published on behalf of SISSA Medialab srl by IOP Publishing Ltd 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c387t-e9898140baa184d44d8be75658f4f5af6a0599e7a88bfa209420dc9a906a5fad3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://iopscience.iop.org/article/10.1088/1742-5468/ad01b8/pdf$$EPDF$$P50$$Giop$$Hfree_for_read</linktopdf><link.rule.ids>230,314,776,780,881,27903,27904,53824,53871</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38524253$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>J Dominé, Clémentine C</creatorcontrib><creatorcontrib>Braun, Lukas</creatorcontrib><creatorcontrib>Fitzgerald, James E</creatorcontrib><creatorcontrib>Saxe, Andrew M</creatorcontrib><title>Exact learning dynamics of deep linear networks with prior knowledge</title><title>Journal of statistical mechanics</title><addtitle>JSTAT</addtitle><addtitle>J. Stat. Mech</addtitle><description>Learning in deep neural networks is known to depend critically on the knowledge embedded in the initial network weights. However, few theoretical results have precisely linked prior knowledge to learning dynamics. Here we derive exact solutions to the dynamics of learning with rich prior knowledge in deep linear networks by generalising Fukumizu's matrix Riccati solution (Fukumizu 1998 1E-03). We obtain explicit expressions for the evolving network function, hidden representational similarity, and neural tangent kernel over training for a broad class of initialisations and tasks. The expressions reveal a class of task-independent initialisations that radically alter learning dynamics from slow non-linear dynamics to fast exponential trajectories while converging to a global optimum with identical representational similarity, dissociating learning trajectories from the structure of initial internal representations. We characterise how network weights dynamically align with task structure, rigorously justifying why previous solutions successfully described learning from small initial weights without incorporating their fine-scale structure. Finally, we discuss the implications of these findings for continual learning, reversal learning and learning of structured knowledge. Taken together, our results provide a mathematical toolkit for understanding the impact of prior knowledge on deep learning.</description><subject>deep learning</subject><subject>learning theory</subject><subject>machine learning</subject><subject>Machine Learning 2023</subject><issn>1742-5468</issn><issn>1742-5468</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>O3W</sourceid><recordid>eNp1kL1PwzAQxS0EolDYmZBHBkqdxE7sCaFSPqRKLDBbl9hp3SZ2sFNK_3tStVRlYLrTvXfvTj-EriJyFxHOh1FG4wGjKR-CIlHOj9DZfnR80PfQeQhzQpKYUH6KeglnMY1ZcoYex99QtLjS4K2xU6zWFmpTBOxKrLRucGVsp2Gr25Xzi4BXpp3hxhvn8cK6VaXVVF-gkxKqoC93tY8-nsbvo5fB5O35dfQwGRQJz9qBFlzwiJIcIOJUUap4rjOWMl7SkkGZAmFC6Aw4z0uIiaAxUYUAQVJgJaikj-63uc0yr7UqtG09VLL7pga_lg6M_KtYM5NT9yUjIhhLSdIl3OwSvPtc6tDK2oRCVxVY7ZZBxoKzTDDSoeojsrUW3oXgdbm_ExG5oS83eOUGr9zS71auD__bL_zi7gy3W4NxjZy7pbcdrv_zfgDYlI-L</recordid><startdate>20231101</startdate><enddate>20231101</enddate><creator>J Dominé, Clémentine C</creator><creator>Braun, Lukas</creator><creator>Fitzgerald, James E</creator><creator>Saxe, Andrew M</creator><general>IOP Publishing</general><scope>O3W</scope><scope>TSCCA</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20231101</creationdate><title>Exact learning dynamics of deep linear networks with prior knowledge</title><author>J Dominé, Clémentine C ; Braun, Lukas ; Fitzgerald, James E ; Saxe, Andrew M</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c387t-e9898140baa184d44d8be75658f4f5af6a0599e7a88bfa209420dc9a906a5fad3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>deep learning</topic><topic>learning theory</topic><topic>machine learning</topic><topic>Machine Learning 2023</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>J Dominé, Clémentine C</creatorcontrib><creatorcontrib>Braun, Lukas</creatorcontrib><creatorcontrib>Fitzgerald, James E</creatorcontrib><creatorcontrib>Saxe, Andrew M</creatorcontrib><collection>Open Access: IOP Publishing Free Content</collection><collection>IOPscience (Open Access)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Journal of statistical mechanics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>J Dominé, Clémentine C</au><au>Braun, Lukas</au><au>Fitzgerald, James E</au><au>Saxe, Andrew M</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Exact learning dynamics of deep linear networks with prior knowledge</atitle><jtitle>Journal of statistical mechanics</jtitle><stitle>JSTAT</stitle><addtitle>J. Stat. Mech</addtitle><date>2023-11-01</date><risdate>2023</risdate><volume>2023</volume><issue>11</issue><spage>114004</spage><epage>114004</epage><pages>114004-114004</pages><issn>1742-5468</issn><eissn>1742-5468</eissn><abstract>Learning in deep neural networks is known to depend critically on the knowledge embedded in the initial network weights. However, few theoretical results have precisely linked prior knowledge to learning dynamics. Here we derive exact solutions to the dynamics of learning with rich prior knowledge in deep linear networks by generalising Fukumizu's matrix Riccati solution (Fukumizu 1998 1E-03). We obtain explicit expressions for the evolving network function, hidden representational similarity, and neural tangent kernel over training for a broad class of initialisations and tasks. The expressions reveal a class of task-independent initialisations that radically alter learning dynamics from slow non-linear dynamics to fast exponential trajectories while converging to a global optimum with identical representational similarity, dissociating learning trajectories from the structure of initial internal representations. We characterise how network weights dynamically align with task structure, rigorously justifying why previous solutions successfully described learning from small initial weights without incorporating their fine-scale structure. Finally, we discuss the implications of these findings for continual learning, reversal learning and learning of structured knowledge. Taken together, our results provide a mathematical toolkit for understanding the impact of prior knowledge on deep learning.</abstract><cop>England</cop><pub>IOP Publishing</pub><pmid>38524253</pmid><doi>10.1088/1742-5468/ad01b8</doi><tpages>48</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1742-5468
ispartof	Journal of statistical mechanics, 2023-11, Vol.2023 (11), p.114004-114004
issn	1742-5468 1742-5468
language	eng
recordid	cdi_proquest_miscellaneous_2985795000
source	HEAL-Link subscriptions: Institute of Physics (IOP) Journals; Institute of Physics Journals
subjects	deep learning learning theory machine learning Machine Learning 2023
title	Exact learning dynamics of deep linear networks with prior knowledge
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T18%3A42%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Exact%20learning%20dynamics%20of%20deep%20linear%20networks%20with%20prior%20knowledge&rft.jtitle=Journal%20of%20statistical%20mechanics&rft.au=J%20Domin%C3%A9,%20Cl%C3%A9mentine%20C&rft.date=2023-11-01&rft.volume=2023&rft.issue=11&rft.spage=114004&rft.epage=114004&rft.pages=114004-114004&rft.issn=1742-5468&rft.eissn=1742-5468&rft_id=info:doi/10.1088/1742-5468/ad01b8&rft_dat=%3Cproquest_cross%3E2985795000%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2985795000&rft_id=info:pmid/38524253&rfr_iscdi=true