On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay

Training neural networks with batch normalization and weight decay has become a common practice in recent years. In this work, we show that their combined use may result in a surprising periodic behavior of optimization dynamics: the training process regularly exhibits destabilizations that, however...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2022-01
Hauptverfasser: Lobacheva, Ekaterina, Kodryan, Maxim, Chirkova, Nadezhda, Malinin, Andrey, Vetrov, Dmitry
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Lobacheva, Ekaterina
Kodryan, Maxim
Chirkova, Nadezhda
Malinin, Andrey
Vetrov, Dmitry
description Training neural networks with batch normalization and weight decay has become a common practice in recent years. In this work, we show that their combined use may result in a surprising periodic behavior of optimization dynamics: the training process regularly exhibits destabilizations that, however, do not lead to complete divergence but cause a new period of training. We rigorously investigate the mechanism underlying the discovered periodic behavior from both empirical and theoretical points of view and analyze the conditions in which it occurs in practice. We also demonstrate that periodic behavior can be regarded as a generalization of two previously opposing perspectives on training with batch normalization and weight decay, namely the equilibrium presumption and the instability presumption.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2547170690</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2547170690</sourcerecordid><originalsourceid>FETCH-proquest_journals_25471706903</originalsourceid><addsrcrecordid>eNqNyr0KwjAUQOEgCBbtO1xwLsTEtrr6h5M6CE4ilzZtrtZEk1TRp9fBB3D6hnM6LBJSjpLJWIgei70_c85Flos0lRE7bg0ErWCnHNmSCpgpjQ-yDmwFG9U6bL6Ep3UX2DskQ6aGJwUNMwyFho11V2zojYGsATQlHBTVOsBCFfgasG6FjVfxzz4brpb7-Tq5OXtvlQ-ns22d-aaTSMf5KOfZlMv_rg-ioUMZ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2547170690</pqid></control><display><type>article</type><title>On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay</title><source>Free E- Journals</source><creator>Lobacheva, Ekaterina ; Kodryan, Maxim ; Chirkova, Nadezhda ; Malinin, Andrey ; Vetrov, Dmitry</creator><creatorcontrib>Lobacheva, Ekaterina ; Kodryan, Maxim ; Chirkova, Nadezhda ; Malinin, Andrey ; Vetrov, Dmitry</creatorcontrib><description>Training neural networks with batch normalization and weight decay has become a common practice in recent years. In this work, we show that their combined use may result in a surprising periodic behavior of optimization dynamics: the training process regularly exhibits destabilizations that, however, do not lead to complete divergence but cause a new period of training. We rigorously investigate the mechanism underlying the discovered periodic behavior from both empirical and theoretical points of view and analyze the conditions in which it occurs in practice. We also demonstrate that periodic behavior can be regarded as a generalization of two previously opposing perspectives on training with batch normalization and weight decay, namely the equilibrium presumption and the instability presumption.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Convergence ; Decay ; Neural networks ; Training ; Weight</subject><ispartof>arXiv.org, 2022-01</ispartof><rights>2022. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Lobacheva, Ekaterina</creatorcontrib><creatorcontrib>Kodryan, Maxim</creatorcontrib><creatorcontrib>Chirkova, Nadezhda</creatorcontrib><creatorcontrib>Malinin, Andrey</creatorcontrib><creatorcontrib>Vetrov, Dmitry</creatorcontrib><title>On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay</title><title>arXiv.org</title><description>Training neural networks with batch normalization and weight decay has become a common practice in recent years. In this work, we show that their combined use may result in a surprising periodic behavior of optimization dynamics: the training process regularly exhibits destabilizations that, however, do not lead to complete divergence but cause a new period of training. We rigorously investigate the mechanism underlying the discovered periodic behavior from both empirical and theoretical points of view and analyze the conditions in which it occurs in practice. We also demonstrate that periodic behavior can be regarded as a generalization of two previously opposing perspectives on training with batch normalization and weight decay, namely the equilibrium presumption and the instability presumption.</description><subject>Convergence</subject><subject>Decay</subject><subject>Neural networks</subject><subject>Training</subject><subject>Weight</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNyr0KwjAUQOEgCBbtO1xwLsTEtrr6h5M6CE4ilzZtrtZEk1TRp9fBB3D6hnM6LBJSjpLJWIgei70_c85Flos0lRE7bg0ErWCnHNmSCpgpjQ-yDmwFG9U6bL6Ep3UX2DskQ6aGJwUNMwyFho11V2zojYGsATQlHBTVOsBCFfgasG6FjVfxzz4brpb7-Tq5OXtvlQ-ns22d-aaTSMf5KOfZlMv_rg-ioUMZ</recordid><startdate>20220115</startdate><enddate>20220115</enddate><creator>Lobacheva, Ekaterina</creator><creator>Kodryan, Maxim</creator><creator>Chirkova, Nadezhda</creator><creator>Malinin, Andrey</creator><creator>Vetrov, Dmitry</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20220115</creationdate><title>On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay</title><author>Lobacheva, Ekaterina ; Kodryan, Maxim ; Chirkova, Nadezhda ; Malinin, Andrey ; Vetrov, Dmitry</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_25471706903</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Convergence</topic><topic>Decay</topic><topic>Neural networks</topic><topic>Training</topic><topic>Weight</topic><toplevel>online_resources</toplevel><creatorcontrib>Lobacheva, Ekaterina</creatorcontrib><creatorcontrib>Kodryan, Maxim</creatorcontrib><creatorcontrib>Chirkova, Nadezhda</creatorcontrib><creatorcontrib>Malinin, Andrey</creatorcontrib><creatorcontrib>Vetrov, Dmitry</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lobacheva, Ekaterina</au><au>Kodryan, Maxim</au><au>Chirkova, Nadezhda</au><au>Malinin, Andrey</au><au>Vetrov, Dmitry</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay</atitle><jtitle>arXiv.org</jtitle><date>2022-01-15</date><risdate>2022</risdate><eissn>2331-8422</eissn><abstract>Training neural networks with batch normalization and weight decay has become a common practice in recent years. In this work, we show that their combined use may result in a surprising periodic behavior of optimization dynamics: the training process regularly exhibits destabilizations that, however, do not lead to complete divergence but cause a new period of training. We rigorously investigate the mechanism underlying the discovered periodic behavior from both empirical and theoretical points of view and analyze the conditions in which it occurs in practice. We also demonstrate that periodic behavior can be regarded as a generalization of two previously opposing perspectives on training with batch normalization and weight decay, namely the equilibrium presumption and the instability presumption.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2022-01
issn 2331-8422
language eng
recordid cdi_proquest_journals_2547170690
source Free E- Journals
subjects Convergence
Decay
Neural networks
Training
Weight
title On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T09%3A33%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=On%20the%20Periodic%20Behavior%20of%20Neural%20Network%20Training%20with%20Batch%20Normalization%20and%20Weight%20Decay&rft.jtitle=arXiv.org&rft.au=Lobacheva,%20Ekaterina&rft.date=2022-01-15&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2547170690%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2547170690&rft_id=info:pmid/&rfr_iscdi=true