Towards Understanding the Importance of Shortcut Connections in Residual Networks

Residual Network (ResNet) is undoubtedly a milestone in deep learning. ResNet is equipped with shortcut connections between layers, and exhibits efficient training using simple first order algorithms. Despite of the great empirical success, the reason behind is far from being well understood. In thi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Liu, Tianyi, Chen, Minshuo, Zhou, Mo, Du, Simon S, Zhou, Enlu, Zhao, Tuo
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning Mathematics - Optimization and Control Statistics - Machine Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Liu, Tianyi Chen, Minshuo Zhou, Mo Du, Simon S Zhou, Enlu Zhao, Tuo
description	Residual Network (ResNet) is undoubtedly a milestone in deep learning. ResNet is equipped with shortcut connections between layers, and exhibits efficient training using simple first order algorithms. Despite of the great empirical success, the reason behind is far from being well understood. In this paper, we study a two-layer non-overlapping convolutional ResNet. Training such a network requires solving a non-convex optimization problem with a spurious local optimum. We show, however, that gradient descent combined with proper normalization, avoids being trapped by the spurious local optimum, and converges to a global optimum in polynomial time, when the weight of the first layer is initialized at 0, and that of the second layer is initialized arbitrarily in a ball. Numerical experiments are provided to support our theory.
doi_str_mv	10.48550/arxiv.1909.04653
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1909_04653</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1909_04653</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-f52f00c098699381cdf056d808ebb2a66b7d1413359f0890bc0e9e77507209443</originalsourceid><addsrcrecordid>eNotz8lOwzAYBGBfOKDCA3CqXyDhdxxvRxSxVKqogPQcOV6oRWtXdkrh7SmF02jmMNKH0A2BupWMwa3OX-GzJgpUDS1n9BK99Omosy14Ha3LZdLRhviOp43Di90-5dNgHE4ev21OxRwm3KUYnZlCigWHiF9dCfagt_jZTceUP8oVuvB6W9z1f85Q_3Dfd0_VcvW46O6WleaCVp41HsCAklwpKomxHhi3EqQbx0ZzPgpLWkIpUx6kgtGAU04IBqIB1bZ0huZ_t2fTsM9hp_P38Gsbzjb6A-eWSZ8</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Towards Understanding the Importance of Shortcut Connections in Residual Networks</title><source>arXiv.org</source><creator>Liu, Tianyi ; Chen, Minshuo ; Zhou, Mo ; Du, Simon S ; Zhou, Enlu ; Zhao, Tuo</creator><creatorcontrib>Liu, Tianyi ; Chen, Minshuo ; Zhou, Mo ; Du, Simon S ; Zhou, Enlu ; Zhao, Tuo</creatorcontrib><description>Residual Network (ResNet) is undoubtedly a milestone in deep learning. ResNet is equipped with shortcut connections between layers, and exhibits efficient training using simple first order algorithms. Despite of the great empirical success, the reason behind is far from being well understood. In this paper, we study a two-layer non-overlapping convolutional ResNet. Training such a network requires solving a non-convex optimization problem with a spurious local optimum. We show, however, that gradient descent combined with proper normalization, avoids being trapped by the spurious local optimum, and converges to a global optimum in polynomial time, when the weight of the first layer is initialized at 0, and that of the second layer is initialized arbitrarily in a ball. Numerical experiments are provided to support our theory.</description><identifier>DOI: 10.48550/arxiv.1909.04653</identifier><language>eng</language><subject>Computer Science - Learning ; Mathematics - Optimization and Control ; Statistics - Machine Learning</subject><creationdate>2019-09</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1909.04653$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1909.04653$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Liu, Tianyi</creatorcontrib><creatorcontrib>Chen, Minshuo</creatorcontrib><creatorcontrib>Zhou, Mo</creatorcontrib><creatorcontrib>Du, Simon S</creatorcontrib><creatorcontrib>Zhou, Enlu</creatorcontrib><creatorcontrib>Zhao, Tuo</creatorcontrib><title>Towards Understanding the Importance of Shortcut Connections in Residual Networks</title><description>Residual Network (ResNet) is undoubtedly a milestone in deep learning. ResNet is equipped with shortcut connections between layers, and exhibits efficient training using simple first order algorithms. Despite of the great empirical success, the reason behind is far from being well understood. In this paper, we study a two-layer non-overlapping convolutional ResNet. Training such a network requires solving a non-convex optimization problem with a spurious local optimum. We show, however, that gradient descent combined with proper normalization, avoids being trapped by the spurious local optimum, and converges to a global optimum in polynomial time, when the weight of the first layer is initialized at 0, and that of the second layer is initialized arbitrarily in a ball. Numerical experiments are provided to support our theory.</description><subject>Computer Science - Learning</subject><subject>Mathematics - Optimization and Control</subject><subject>Statistics - Machine Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz8lOwzAYBGBfOKDCA3CqXyDhdxxvRxSxVKqogPQcOV6oRWtXdkrh7SmF02jmMNKH0A2BupWMwa3OX-GzJgpUDS1n9BK99Omosy14Ha3LZdLRhviOp43Di90-5dNgHE4ev21OxRwm3KUYnZlCigWHiF9dCfagt_jZTceUP8oVuvB6W9z1f85Q_3Dfd0_VcvW46O6WleaCVp41HsCAklwpKomxHhi3EqQbx0ZzPgpLWkIpUx6kgtGAU04IBqIB1bZ0huZ_t2fTsM9hp_P38Gsbzjb6A-eWSZ8</recordid><startdate>20190910</startdate><enddate>20190910</enddate><creator>Liu, Tianyi</creator><creator>Chen, Minshuo</creator><creator>Zhou, Mo</creator><creator>Du, Simon S</creator><creator>Zhou, Enlu</creator><creator>Zhao, Tuo</creator><scope>AKY</scope><scope>AKZ</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20190910</creationdate><title>Towards Understanding the Importance of Shortcut Connections in Residual Networks</title><author>Liu, Tianyi ; Chen, Minshuo ; Zhou, Mo ; Du, Simon S ; Zhou, Enlu ; Zhao, Tuo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-f52f00c098699381cdf056d808ebb2a66b7d1413359f0890bc0e9e77507209443</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Science - Learning</topic><topic>Mathematics - Optimization and Control</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Liu, Tianyi</creatorcontrib><creatorcontrib>Chen, Minshuo</creatorcontrib><creatorcontrib>Zhou, Mo</creatorcontrib><creatorcontrib>Du, Simon S</creatorcontrib><creatorcontrib>Zhou, Enlu</creatorcontrib><creatorcontrib>Zhao, Tuo</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Mathematics</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liu, Tianyi</au><au>Chen, Minshuo</au><au>Zhou, Mo</au><au>Du, Simon S</au><au>Zhou, Enlu</au><au>Zhao, Tuo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Towards Understanding the Importance of Shortcut Connections in Residual Networks</atitle><date>2019-09-10</date><risdate>2019</risdate><abstract>Residual Network (ResNet) is undoubtedly a milestone in deep learning. ResNet is equipped with shortcut connections between layers, and exhibits efficient training using simple first order algorithms. Despite of the great empirical success, the reason behind is far from being well understood. In this paper, we study a two-layer non-overlapping convolutional ResNet. Training such a network requires solving a non-convex optimization problem with a spurious local optimum. We show, however, that gradient descent combined with proper normalization, avoids being trapped by the spurious local optimum, and converges to a global optimum in polynomial time, when the weight of the first layer is initialized at 0, and that of the second layer is initialized arbitrarily in a ball. Numerical experiments are provided to support our theory.</abstract><doi>10.48550/arxiv.1909.04653</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1909.04653
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1909_04653
source	arXiv.org
subjects	Computer Science - Learning Mathematics - Optimization and Control Statistics - Machine Learning
title	Towards Understanding the Importance of Shortcut Connections in Residual Networks
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T17%3A52%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Towards%20Understanding%20the%20Importance%20of%20Shortcut%20Connections%20in%20Residual%20Networks&rft.au=Liu,%20Tianyi&rft.date=2019-09-10&rft_id=info:doi/10.48550/arxiv.1909.04653&rft_dat=%3Carxiv_GOX%3E1909_04653%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true