Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes

A fundamental property of deep learning normalization techniques, such as batch normalization, is making the pre-normalization parameters scale invariant. The intrinsic domain of such parameters is the unit sphere, and therefore their gradient optimization dynamics can be represented via spherical o...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2023-01
Hauptverfasser:	Kodryan, Maxim, Lobacheva, Ekaterina, Nakhodnov, Maksim, Vetrov, Dmitry
Format:	Artikel
Sprache:	eng
Schlagworte:	Deep learning Empirical analysis Invariants Neural networks Optimization Parameters Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Kodryan, Maxim Lobacheva, Ekaterina Nakhodnov, Maksim Vetrov, Dmitry
description	A fundamental property of deep learning normalization techniques, such as batch normalization, is making the pre-normalization parameters scale invariant. The intrinsic domain of such parameters is the unit sphere, and therefore their gradient optimization dynamics can be represented via spherical optimization with varying effective learning rate (ELR), which was studied previously. However, the varying ELR may obscure certain characteristics of the intrinsic loss landscape structure. In this work, we investigate the properties of training scale-invariant neural networks directly on the sphere using a fixed ELR. We discover three regimes of such training depending on the ELR value: convergence, chaotic equilibrium, and divergence. We study these regimes in detail both on a theoretical examination of a toy example and on a thorough empirical analysis of real scale-invariant deep learning models. Each regime has unique features and reflects specific properties of the intrinsic loss landscape, some of which have strong parallels with previous research on both regular and scale-invariant neural networks training. Finally, we demonstrate how the discovered regimes are reflected in conventional training of normalized networks and how they can be leveraged to achieve better optima.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2712096303</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2712096303</sourcerecordid><originalsourceid>FETCH-proquest_journals_27120963033</originalsourceid><addsrcrecordid>eNqNyrEOgjAUQNHGxESi_MNLnElKK6Az0eDiAOymMU8o4mtti_6-DH6A0xnuXbBISJkm-50QKxZ7P3DORV6ILJMRq1unNGnqoLmpEZMzvZXTigJccHJqnAkf4x4eDEHoERrbo0MoFUGlrEUCTdD2DhFq7PQT_YYt72r0GP9cs-3p2JZVYp15TejDdTCTozldRZEKfsgll_K_6wuJRD7I</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2712096303</pqid></control><display><type>article</type><title>Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes</title><source>Free E- Journals</source><creator>Kodryan, Maxim ; Lobacheva, Ekaterina ; Nakhodnov, Maksim ; Vetrov, Dmitry</creator><creatorcontrib>Kodryan, Maxim ; Lobacheva, Ekaterina ; Nakhodnov, Maksim ; Vetrov, Dmitry</creatorcontrib><description>A fundamental property of deep learning normalization techniques, such as batch normalization, is making the pre-normalization parameters scale invariant. The intrinsic domain of such parameters is the unit sphere, and therefore their gradient optimization dynamics can be represented via spherical optimization with varying effective learning rate (ELR), which was studied previously. However, the varying ELR may obscure certain characteristics of the intrinsic loss landscape structure. In this work, we investigate the properties of training scale-invariant neural networks directly on the sphere using a fixed ELR. We discover three regimes of such training depending on the ELR value: convergence, chaotic equilibrium, and divergence. We study these regimes in detail both on a theoretical examination of a toy example and on a thorough empirical analysis of real scale-invariant deep learning models. Each regime has unique features and reflects specific properties of the intrinsic loss landscape, some of which have strong parallels with previous research on both regular and scale-invariant neural networks training. Finally, we demonstrate how the discovered regimes are reflected in conventional training of normalized networks and how they can be leveraged to achieve better optima.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Deep learning ; Empirical analysis ; Invariants ; Neural networks ; Optimization ; Parameters ; Training</subject><ispartof>arXiv.org, 2023-01</ispartof><rights>2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Kodryan, Maxim</creatorcontrib><creatorcontrib>Lobacheva, Ekaterina</creatorcontrib><creatorcontrib>Nakhodnov, Maksim</creatorcontrib><creatorcontrib>Vetrov, Dmitry</creatorcontrib><title>Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes</title><title>arXiv.org</title><description>A fundamental property of deep learning normalization techniques, such as batch normalization, is making the pre-normalization parameters scale invariant. The intrinsic domain of such parameters is the unit sphere, and therefore their gradient optimization dynamics can be represented via spherical optimization with varying effective learning rate (ELR), which was studied previously. However, the varying ELR may obscure certain characteristics of the intrinsic loss landscape structure. In this work, we investigate the properties of training scale-invariant neural networks directly on the sphere using a fixed ELR. We discover three regimes of such training depending on the ELR value: convergence, chaotic equilibrium, and divergence. We study these regimes in detail both on a theoretical examination of a toy example and on a thorough empirical analysis of real scale-invariant deep learning models. Each regime has unique features and reflects specific properties of the intrinsic loss landscape, some of which have strong parallels with previous research on both regular and scale-invariant neural networks training. Finally, we demonstrate how the discovered regimes are reflected in conventional training of normalized networks and how they can be leveraged to achieve better optima.</description><subject>Deep learning</subject><subject>Empirical analysis</subject><subject>Invariants</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>Parameters</subject><subject>Training</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNyrEOgjAUQNHGxESi_MNLnElKK6Az0eDiAOymMU8o4mtti_6-DH6A0xnuXbBISJkm-50QKxZ7P3DORV6ILJMRq1unNGnqoLmpEZMzvZXTigJccHJqnAkf4x4eDEHoERrbo0MoFUGlrEUCTdD2DhFq7PQT_YYt72r0GP9cs-3p2JZVYp15TejDdTCTozldRZEKfsgll_K_6wuJRD7I</recordid><startdate>20230115</startdate><enddate>20230115</enddate><creator>Kodryan, Maxim</creator><creator>Lobacheva, Ekaterina</creator><creator>Nakhodnov, Maksim</creator><creator>Vetrov, Dmitry</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20230115</creationdate><title>Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes</title><author>Kodryan, Maxim ; Lobacheva, Ekaterina ; Nakhodnov, Maksim ; Vetrov, Dmitry</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_27120963033</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Deep learning</topic><topic>Empirical analysis</topic><topic>Invariants</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>Parameters</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Kodryan, Maxim</creatorcontrib><creatorcontrib>Lobacheva, Ekaterina</creatorcontrib><creatorcontrib>Nakhodnov, Maksim</creatorcontrib><creatorcontrib>Vetrov, Dmitry</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kodryan, Maxim</au><au>Lobacheva, Ekaterina</au><au>Nakhodnov, Maksim</au><au>Vetrov, Dmitry</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes</atitle><jtitle>arXiv.org</jtitle><date>2023-01-15</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>A fundamental property of deep learning normalization techniques, such as batch normalization, is making the pre-normalization parameters scale invariant. The intrinsic domain of such parameters is the unit sphere, and therefore their gradient optimization dynamics can be represented via spherical optimization with varying effective learning rate (ELR), which was studied previously. However, the varying ELR may obscure certain characteristics of the intrinsic loss landscape structure. In this work, we investigate the properties of training scale-invariant neural networks directly on the sphere using a fixed ELR. We discover three regimes of such training depending on the ELR value: convergence, chaotic equilibrium, and divergence. We study these regimes in detail both on a theoretical examination of a toy example and on a thorough empirical analysis of real scale-invariant deep learning models. Each regime has unique features and reflects specific properties of the intrinsic loss landscape, some of which have strong parallels with previous research on both regular and scale-invariant neural networks training. Finally, we demonstrate how the discovered regimes are reflected in conventional training of normalized networks and how they can be leveraged to achieve better optima.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2023-01
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2712096303
source	Free E- Journals
subjects	Deep learning Empirical analysis Invariants Neural networks Optimization Parameters Training
title	Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T01%3A02%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Training%20Scale-Invariant%20Neural%20Networks%20on%20the%20Sphere%20Can%20Happen%20in%20Three%20Regimes&rft.jtitle=arXiv.org&rft.au=Kodryan,%20Maxim&rft.date=2023-01-15&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2712096303%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2712096303&rft_id=info:pmid/&rfr_iscdi=true