Solvable Model for Inheriting the Regularization through Knowledge Distillation

Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference, PMLR 145:809-846, 2022 In recent years the empirical success of transfer learning with neural networks has stimulated an increasing interest in obtaining a theoretical understanding of its core properties. Knowledge dist...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Saglietti, Luca, Zdeborová, Lenka
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning Computer Science - Neural and Evolutionary Computing Physics - Disordered Systems and Neural Networks
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Saglietti, Luca Zdeborová, Lenka
description	Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference, PMLR 145:809-846, 2022 In recent years the empirical success of transfer learning with neural networks has stimulated an increasing interest in obtaining a theoretical understanding of its core properties. Knowledge distillation where a smaller neural network is trained using the outputs of a larger neural network is a particularly interesting case of transfer learning. In the present work, we introduce a statistical physics framework that allows an analytic characterization of the properties of knowledge distillation (KD) in shallow neural networks. Focusing the analysis on a solvable model that exhibits a non-trivial generalization gap, we investigate the effectiveness of KD. We are able to show that, through KD, the regularization properties of the larger teacher model can be inherited by the smaller student and that the yielded generalization performance is closely linked to and limited by the optimality of the teacher. Finally, we analyze the double descent phenomenology that can arise in the considered KD setting.
doi_str_mv	10.48550/arxiv.2012.00194
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2012_00194</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2012_00194</sourcerecordid><originalsourceid>FETCH-LOGICAL-a674-ed6812aa3512aff91a6fbc27a1fdd1e5342465620815654d0d2793ef5bed1ca53</originalsourceid><addsrcrecordid>eNotj8tuwjAURL3poqL9gK7qH0jqd5IlotCigpAK--gGXyeW3LgygT6-vpCymZHmSCMdQh44y1WpNXuC9O1PuWBc5IzxSt2SzTaGEzQB6TpaDNTFRJd9h8kPvm_p0CF9x_YYIPlfGHzsz1OKx7ajb338CmhbpM_-MPgQRnxHbhyEA95fe0J2i_lu9pqtNi_L2XSVgSlUhtaUXABIfU7nKg7GNXtRAHfWctRSCWW0Eazk2mhlmRVFJdHpBi3fg5YT8vh_OxrVn8l_QPqpL2b1aCb_AGkaSgA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Solvable Model for Inheriting the Regularization through Knowledge Distillation</title><source>arXiv.org</source><creator>Saglietti, Luca ; Zdeborová, Lenka</creator><creatorcontrib>Saglietti, Luca ; Zdeborová, Lenka</creatorcontrib><description>Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference, PMLR 145:809-846, 2022 In recent years the empirical success of transfer learning with neural networks has stimulated an increasing interest in obtaining a theoretical understanding of its core properties. Knowledge distillation where a smaller neural network is trained using the outputs of a larger neural network is a particularly interesting case of transfer learning. In the present work, we introduce a statistical physics framework that allows an analytic characterization of the properties of knowledge distillation (KD) in shallow neural networks. Focusing the analysis on a solvable model that exhibits a non-trivial generalization gap, we investigate the effectiveness of KD. We are able to show that, through KD, the regularization properties of the larger teacher model can be inherited by the smaller student and that the yielded generalization performance is closely linked to and limited by the optimality of the teacher. Finally, we analyze the double descent phenomenology that can arise in the considered KD setting.</description><identifier>DOI: 10.48550/arxiv.2012.00194</identifier><language>eng</language><subject>Computer Science - Learning ; Computer Science - Neural and Evolutionary Computing ; Physics - Disordered Systems and Neural Networks</subject><creationdate>2020-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2012.00194$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2012.00194$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Saglietti, Luca</creatorcontrib><creatorcontrib>Zdeborová, Lenka</creatorcontrib><title>Solvable Model for Inheriting the Regularization through Knowledge Distillation</title><description>Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference, PMLR 145:809-846, 2022 In recent years the empirical success of transfer learning with neural networks has stimulated an increasing interest in obtaining a theoretical understanding of its core properties. Knowledge distillation where a smaller neural network is trained using the outputs of a larger neural network is a particularly interesting case of transfer learning. In the present work, we introduce a statistical physics framework that allows an analytic characterization of the properties of knowledge distillation (KD) in shallow neural networks. Focusing the analysis on a solvable model that exhibits a non-trivial generalization gap, we investigate the effectiveness of KD. We are able to show that, through KD, the regularization properties of the larger teacher model can be inherited by the smaller student and that the yielded generalization performance is closely linked to and limited by the optimality of the teacher. Finally, we analyze the double descent phenomenology that can arise in the considered KD setting.</description><subject>Computer Science - Learning</subject><subject>Computer Science - Neural and Evolutionary Computing</subject><subject>Physics - Disordered Systems and Neural Networks</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tuwjAURL3poqL9gK7qH0jqd5IlotCigpAK--gGXyeW3LgygT6-vpCymZHmSCMdQh44y1WpNXuC9O1PuWBc5IzxSt2SzTaGEzQB6TpaDNTFRJd9h8kPvm_p0CF9x_YYIPlfGHzsz1OKx7ajb338CmhbpM_-MPgQRnxHbhyEA95fe0J2i_lu9pqtNi_L2XSVgSlUhtaUXABIfU7nKg7GNXtRAHfWctRSCWW0Eazk2mhlmRVFJdHpBi3fg5YT8vh_OxrVn8l_QPqpL2b1aCb_AGkaSgA</recordid><startdate>20201130</startdate><enddate>20201130</enddate><creator>Saglietti, Luca</creator><creator>Zdeborová, Lenka</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20201130</creationdate><title>Solvable Model for Inheriting the Regularization through Knowledge Distillation</title><author>Saglietti, Luca ; Zdeborová, Lenka</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a674-ed6812aa3512aff91a6fbc27a1fdd1e5342465620815654d0d2793ef5bed1ca53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Learning</topic><topic>Computer Science - Neural and Evolutionary Computing</topic><topic>Physics - Disordered Systems and Neural Networks</topic><toplevel>online_resources</toplevel><creatorcontrib>Saglietti, Luca</creatorcontrib><creatorcontrib>Zdeborová, Lenka</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Saglietti, Luca</au><au>Zdeborová, Lenka</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Solvable Model for Inheriting the Regularization through Knowledge Distillation</atitle><date>2020-11-30</date><risdate>2020</risdate><abstract>Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference, PMLR 145:809-846, 2022 In recent years the empirical success of transfer learning with neural networks has stimulated an increasing interest in obtaining a theoretical understanding of its core properties. Knowledge distillation where a smaller neural network is trained using the outputs of a larger neural network is a particularly interesting case of transfer learning. In the present work, we introduce a statistical physics framework that allows an analytic characterization of the properties of knowledge distillation (KD) in shallow neural networks. Focusing the analysis on a solvable model that exhibits a non-trivial generalization gap, we investigate the effectiveness of KD. We are able to show that, through KD, the regularization properties of the larger teacher model can be inherited by the smaller student and that the yielded generalization performance is closely linked to and limited by the optimality of the teacher. Finally, we analyze the double descent phenomenology that can arise in the considered KD setting.</abstract><doi>10.48550/arxiv.2012.00194</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2012.00194
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2012_00194
source	arXiv.org
subjects	Computer Science - Learning Computer Science - Neural and Evolutionary Computing Physics - Disordered Systems and Neural Networks
title	Solvable Model for Inheriting the Regularization through Knowledge Distillation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T12%3A17%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Solvable%20Model%20for%20Inheriting%20the%20Regularization%20through%20Knowledge%20Distillation&rft.au=Saglietti,%20Luca&rft.date=2020-11-30&rft_id=info:doi/10.48550/arxiv.2012.00194&rft_dat=%3Carxiv_GOX%3E2012_00194%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true