An Analytical Theory of Curriculum Learning in Teacher-Student Networks

In humans and animals, curriculum learning -- presenting data in a curated order - is critical to rapid learning and effective pedagogy. Yet in machine learning, curricula are not widely used and empirically often yield only moderate benefits. This stark difference in the importance of curriculum ra...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2022-10
Hauptverfasser:	Saglietti, Luca, Stefano Sarao Mannelli, Saxe, Andrew
Format:	Artikel
Sprache:	eng
Schlagworte:	Asymptotic methods Computer Science - Learning Core curriculum Curricula Distance learning Empirical analysis Machine learning Neural networks Physics - Disordered Systems and Neural Networks Statistical methods Statistics - Machine Learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Saglietti, Luca Stefano Sarao Mannelli Saxe, Andrew
description	In humans and animals, curriculum learning -- presenting data in a curated order - is critical to rapid learning and effective pedagogy. Yet in machine learning, curricula are not widely used and empirically often yield only moderate benefits. This stark difference in the importance of curriculum raises a fundamental theoretical question: when and why does curriculum learning help? In this work, we analyse a prototypical neural network model of curriculum learning in the high-dimensional limit, employing statistical physics methods. Curricula could in principle change both the learning speed and asymptotic performance of a model. To study the former, we provide an exact description of the online learning setting, confirming the long-standing experimental observation that curricula can modestly speed up learning. To study the latter, we derive performance in a batch learning setting, in which a network trains to convergence in successive phases of learning on dataset slices of varying difficulty. With standard training losses, curriculum does not provide generalisation benefit, in line with empirical observations. However, we show that by connecting different learning phases through simple Gaussian priors, curriculum can yield a large improvement in test performance. Taken together, our reduced analytical descriptions help reconcile apparently conflicting empirical results and trace regimes where curriculum learning yields the largest gains. More broadly, our results suggest that fully exploiting a curriculum may require explicit changes to the loss function at curriculum boundaries.
doi_str_mv	10.48550/arxiv.2106.08068
format	Article
fullrecord	<record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2106_08068</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2541574894</sourcerecordid><originalsourceid>FETCH-LOGICAL-a958-98ea5b88be5cedfc14321cc50d69021f768c9e2224fdfbe8c546a5417c008c043</originalsourceid><addsrcrecordid>eNotz01Pg0AUheGJiYlN7Q9w5SSuwTtfMCwJ0WpCdCF7MgwXO5VCHUDl34utq7N5c5KHkBsGodRKwb3xP-4r5AyiEDRE-oKsuBAs0JLzK7IZhj0A8CjmSokV2aYdTTvTzqOzpqXFDns_076h2eS9s1M7HWiOxneue6euowUau0MfvI1Tjd1IX3D87v3HcE0uG9MOuPnfNSkeH4rsKchft89ZmgcmUTpINBpVaV2hslg3lknBmbUK6igBzpo40jZBzrls6qZCbZWMjJIstgDaghRrcnu-PSnLo3cH4-fyT1uetEtxdy6Ovv-ccBjLfT_5RTiUfHlSsdSJFL-nb1dR</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2541574894</pqid></control><display><type>article</type><title>An Analytical Theory of Curriculum Learning in Teacher-Student Networks</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Saglietti, Luca ; Stefano Sarao Mannelli ; Saxe, Andrew</creator><creatorcontrib>Saglietti, Luca ; Stefano Sarao Mannelli ; Saxe, Andrew</creatorcontrib><description>In humans and animals, curriculum learning -- presenting data in a curated order - is critical to rapid learning and effective pedagogy. Yet in machine learning, curricula are not widely used and empirically often yield only moderate benefits. This stark difference in the importance of curriculum raises a fundamental theoretical question: when and why does curriculum learning help? In this work, we analyse a prototypical neural network model of curriculum learning in the high-dimensional limit, employing statistical physics methods. Curricula could in principle change both the learning speed and asymptotic performance of a model. To study the former, we provide an exact description of the online learning setting, confirming the long-standing experimental observation that curricula can modestly speed up learning. To study the latter, we derive performance in a batch learning setting, in which a network trains to convergence in successive phases of learning on dataset slices of varying difficulty. With standard training losses, curriculum does not provide generalisation benefit, in line with empirical observations. However, we show that by connecting different learning phases through simple Gaussian priors, curriculum can yield a large improvement in test performance. Taken together, our reduced analytical descriptions help reconcile apparently conflicting empirical results and trace regimes where curriculum learning yields the largest gains. More broadly, our results suggest that fully exploiting a curriculum may require explicit changes to the loss function at curriculum boundaries.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2106.08068</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Asymptotic methods ; Computer Science - Learning ; Core curriculum ; Curricula ; Distance learning ; Empirical analysis ; Machine learning ; Neural networks ; Physics - Disordered Systems and Neural Networks ; Statistical methods ; Statistics - Machine Learning</subject><ispartof>arXiv.org, 2022-10</ispartof><rights>2022. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,780,881,27902</link.rule.ids><backlink>$$Uhttps://doi.org/10.1088/1742-5468/ac9b3c$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.48550/arXiv.2106.08068$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Saglietti, Luca</creatorcontrib><creatorcontrib>Stefano Sarao Mannelli</creatorcontrib><creatorcontrib>Saxe, Andrew</creatorcontrib><title>An Analytical Theory of Curriculum Learning in Teacher-Student Networks</title><title>arXiv.org</title><description>In humans and animals, curriculum learning -- presenting data in a curated order - is critical to rapid learning and effective pedagogy. Yet in machine learning, curricula are not widely used and empirically often yield only moderate benefits. This stark difference in the importance of curriculum raises a fundamental theoretical question: when and why does curriculum learning help? In this work, we analyse a prototypical neural network model of curriculum learning in the high-dimensional limit, employing statistical physics methods. Curricula could in principle change both the learning speed and asymptotic performance of a model. To study the former, we provide an exact description of the online learning setting, confirming the long-standing experimental observation that curricula can modestly speed up learning. To study the latter, we derive performance in a batch learning setting, in which a network trains to convergence in successive phases of learning on dataset slices of varying difficulty. With standard training losses, curriculum does not provide generalisation benefit, in line with empirical observations. However, we show that by connecting different learning phases through simple Gaussian priors, curriculum can yield a large improvement in test performance. Taken together, our reduced analytical descriptions help reconcile apparently conflicting empirical results and trace regimes where curriculum learning yields the largest gains. More broadly, our results suggest that fully exploiting a curriculum may require explicit changes to the loss function at curriculum boundaries.</description><subject>Asymptotic methods</subject><subject>Computer Science - Learning</subject><subject>Core curriculum</subject><subject>Curricula</subject><subject>Distance learning</subject><subject>Empirical analysis</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>Physics - Disordered Systems and Neural Networks</subject><subject>Statistical methods</subject><subject>Statistics - Machine Learning</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><sourceid>GOX</sourceid><recordid>eNotz01Pg0AUheGJiYlN7Q9w5SSuwTtfMCwJ0WpCdCF7MgwXO5VCHUDl34utq7N5c5KHkBsGodRKwb3xP-4r5AyiEDRE-oKsuBAs0JLzK7IZhj0A8CjmSokV2aYdTTvTzqOzpqXFDns_076h2eS9s1M7HWiOxneue6euowUau0MfvI1Tjd1IX3D87v3HcE0uG9MOuPnfNSkeH4rsKchft89ZmgcmUTpINBpVaV2hslg3lknBmbUK6igBzpo40jZBzrls6qZCbZWMjJIstgDaghRrcnu-PSnLo3cH4-fyT1uetEtxdy6Ovv-ccBjLfT_5RTiUfHlSsdSJFL-nb1dR</recordid><startdate>20221012</startdate><enddate>20221012</enddate><creator>Saglietti, Luca</creator><creator>Stefano Sarao Mannelli</creator><creator>Saxe, Andrew</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20221012</creationdate><title>An Analytical Theory of Curriculum Learning in Teacher-Student Networks</title><author>Saglietti, Luca ; Stefano Sarao Mannelli ; Saxe, Andrew</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a958-98ea5b88be5cedfc14321cc50d69021f768c9e2224fdfbe8c546a5417c008c043</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Asymptotic methods</topic><topic>Computer Science - Learning</topic><topic>Core curriculum</topic><topic>Curricula</topic><topic>Distance learning</topic><topic>Empirical analysis</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>Physics - Disordered Systems and Neural Networks</topic><topic>Statistical methods</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Saglietti, Luca</creatorcontrib><creatorcontrib>Stefano Sarao Mannelli</creatorcontrib><creatorcontrib>Saxe, Andrew</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection><collection>arXiv Computer Science</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Saglietti, Luca</au><au>Stefano Sarao Mannelli</au><au>Saxe, Andrew</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An Analytical Theory of Curriculum Learning in Teacher-Student Networks</atitle><jtitle>arXiv.org</jtitle><date>2022-10-12</date><risdate>2022</risdate><eissn>2331-8422</eissn><abstract>In humans and animals, curriculum learning -- presenting data in a curated order - is critical to rapid learning and effective pedagogy. Yet in machine learning, curricula are not widely used and empirically often yield only moderate benefits. This stark difference in the importance of curriculum raises a fundamental theoretical question: when and why does curriculum learning help? In this work, we analyse a prototypical neural network model of curriculum learning in the high-dimensional limit, employing statistical physics methods. Curricula could in principle change both the learning speed and asymptotic performance of a model. To study the former, we provide an exact description of the online learning setting, confirming the long-standing experimental observation that curricula can modestly speed up learning. To study the latter, we derive performance in a batch learning setting, in which a network trains to convergence in successive phases of learning on dataset slices of varying difficulty. With standard training losses, curriculum does not provide generalisation benefit, in line with empirical observations. However, we show that by connecting different learning phases through simple Gaussian priors, curriculum can yield a large improvement in test performance. Taken together, our reduced analytical descriptions help reconcile apparently conflicting empirical results and trace regimes where curriculum learning yields the largest gains. More broadly, our results suggest that fully exploiting a curriculum may require explicit changes to the loss function at curriculum boundaries.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2106.08068</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2022-10
issn	2331-8422
language	eng
recordid	cdi_arxiv_primary_2106_08068
source	arXiv.org; Free E- Journals
subjects	Asymptotic methods Computer Science - Learning Core curriculum Curricula Distance learning Empirical analysis Machine learning Neural networks Physics - Disordered Systems and Neural Networks Statistical methods Statistics - Machine Learning
title	An Analytical Theory of Curriculum Learning in Teacher-Student Networks
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T06%3A58%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20Analytical%20Theory%20of%20Curriculum%20Learning%20in%20Teacher-Student%20Networks&rft.jtitle=arXiv.org&rft.au=Saglietti,%20Luca&rft.date=2022-10-12&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2106.08068&rft_dat=%3Cproquest_arxiv%3E2541574894%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2541574894&rft_id=info:pmid/&rfr_iscdi=true