Towards Understanding the Importance of Shortcut Connections in Residual Networks
Residual Network (ResNet) is undoubtedly a milestone in deep learning. ResNet is equipped with shortcut connections between layers, and exhibits efficient training using simple first order algorithms. Despite of the great empirical success, the reason behind is far from being well understood. In thi...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Liu, Tianyi Chen, Minshuo Zhou, Mo Du, Simon S Zhou, Enlu Zhao, Tuo |
description | Residual Network (ResNet) is undoubtedly a milestone in deep learning. ResNet
is equipped with shortcut connections between layers, and exhibits efficient
training using simple first order algorithms. Despite of the great empirical
success, the reason behind is far from being well understood. In this paper, we
study a two-layer non-overlapping convolutional ResNet. Training such a network
requires solving a non-convex optimization problem with a spurious local
optimum. We show, however, that gradient descent combined with proper
normalization, avoids being trapped by the spurious local optimum, and
converges to a global optimum in polynomial time, when the weight of the first
layer is initialized at 0, and that of the second layer is initialized
arbitrarily in a ball. Numerical experiments are provided to support our
theory. |
doi_str_mv | 10.48550/arxiv.1909.04653 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1909_04653</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1909_04653</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-f52f00c098699381cdf056d808ebb2a66b7d1413359f0890bc0e9e77507209443</originalsourceid><addsrcrecordid>eNotz8lOwzAYBGBfOKDCA3CqXyDhdxxvRxSxVKqogPQcOV6oRWtXdkrh7SmF02jmMNKH0A2BupWMwa3OX-GzJgpUDS1n9BK99Omosy14Ha3LZdLRhviOp43Di90-5dNgHE4ev21OxRwm3KUYnZlCigWHiF9dCfagt_jZTceUP8oVuvB6W9z1f85Q_3Dfd0_VcvW46O6WleaCVp41HsCAklwpKomxHhi3EqQbx0ZzPgpLWkIpUx6kgtGAU04IBqIB1bZ0huZ_t2fTsM9hp_P38Gsbzjb6A-eWSZ8</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Towards Understanding the Importance of Shortcut Connections in Residual Networks</title><source>arXiv.org</source><creator>Liu, Tianyi ; Chen, Minshuo ; Zhou, Mo ; Du, Simon S ; Zhou, Enlu ; Zhao, Tuo</creator><creatorcontrib>Liu, Tianyi ; Chen, Minshuo ; Zhou, Mo ; Du, Simon S ; Zhou, Enlu ; Zhao, Tuo</creatorcontrib><description>Residual Network (ResNet) is undoubtedly a milestone in deep learning. ResNet
is equipped with shortcut connections between layers, and exhibits efficient
training using simple first order algorithms. Despite of the great empirical
success, the reason behind is far from being well understood. In this paper, we
study a two-layer non-overlapping convolutional ResNet. Training such a network
requires solving a non-convex optimization problem with a spurious local
optimum. We show, however, that gradient descent combined with proper
normalization, avoids being trapped by the spurious local optimum, and
converges to a global optimum in polynomial time, when the weight of the first
layer is initialized at 0, and that of the second layer is initialized
arbitrarily in a ball. Numerical experiments are provided to support our
theory.</description><identifier>DOI: 10.48550/arxiv.1909.04653</identifier><language>eng</language><subject>Computer Science - Learning ; Mathematics - Optimization and Control ; Statistics - Machine Learning</subject><creationdate>2019-09</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1909.04653$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1909.04653$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Liu, Tianyi</creatorcontrib><creatorcontrib>Chen, Minshuo</creatorcontrib><creatorcontrib>Zhou, Mo</creatorcontrib><creatorcontrib>Du, Simon S</creatorcontrib><creatorcontrib>Zhou, Enlu</creatorcontrib><creatorcontrib>Zhao, Tuo</creatorcontrib><title>Towards Understanding the Importance of Shortcut Connections in Residual Networks</title><description>Residual Network (ResNet) is undoubtedly a milestone in deep learning. ResNet
is equipped with shortcut connections between layers, and exhibits efficient
training using simple first order algorithms. Despite of the great empirical
success, the reason behind is far from being well understood. In this paper, we
study a two-layer non-overlapping convolutional ResNet. Training such a network
requires solving a non-convex optimization problem with a spurious local
optimum. We show, however, that gradient descent combined with proper
normalization, avoids being trapped by the spurious local optimum, and
converges to a global optimum in polynomial time, when the weight of the first
layer is initialized at 0, and that of the second layer is initialized
arbitrarily in a ball. Numerical experiments are provided to support our
theory.</description><subject>Computer Science - Learning</subject><subject>Mathematics - Optimization and Control</subject><subject>Statistics - Machine Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz8lOwzAYBGBfOKDCA3CqXyDhdxxvRxSxVKqogPQcOV6oRWtXdkrh7SmF02jmMNKH0A2BupWMwa3OX-GzJgpUDS1n9BK99Omosy14Ha3LZdLRhviOp43Di90-5dNgHE4ev21OxRwm3KUYnZlCigWHiF9dCfagt_jZTceUP8oVuvB6W9z1f85Q_3Dfd0_VcvW46O6WleaCVp41HsCAklwpKomxHhi3EqQbx0ZzPgpLWkIpUx6kgtGAU04IBqIB1bZ0huZ_t2fTsM9hp_P38Gsbzjb6A-eWSZ8</recordid><startdate>20190910</startdate><enddate>20190910</enddate><creator>Liu, Tianyi</creator><creator>Chen, Minshuo</creator><creator>Zhou, Mo</creator><creator>Du, Simon S</creator><creator>Zhou, Enlu</creator><creator>Zhao, Tuo</creator><scope>AKY</scope><scope>AKZ</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20190910</creationdate><title>Towards Understanding the Importance of Shortcut Connections in Residual Networks</title><author>Liu, Tianyi ; Chen, Minshuo ; Zhou, Mo ; Du, Simon S ; Zhou, Enlu ; Zhao, Tuo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-f52f00c098699381cdf056d808ebb2a66b7d1413359f0890bc0e9e77507209443</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Science - Learning</topic><topic>Mathematics - Optimization and Control</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Liu, Tianyi</creatorcontrib><creatorcontrib>Chen, Minshuo</creatorcontrib><creatorcontrib>Zhou, Mo</creatorcontrib><creatorcontrib>Du, Simon S</creatorcontrib><creatorcontrib>Zhou, Enlu</creatorcontrib><creatorcontrib>Zhao, Tuo</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Mathematics</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liu, Tianyi</au><au>Chen, Minshuo</au><au>Zhou, Mo</au><au>Du, Simon S</au><au>Zhou, Enlu</au><au>Zhao, Tuo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Towards Understanding the Importance of Shortcut Connections in Residual Networks</atitle><date>2019-09-10</date><risdate>2019</risdate><abstract>Residual Network (ResNet) is undoubtedly a milestone in deep learning. ResNet
is equipped with shortcut connections between layers, and exhibits efficient
training using simple first order algorithms. Despite of the great empirical
success, the reason behind is far from being well understood. In this paper, we
study a two-layer non-overlapping convolutional ResNet. Training such a network
requires solving a non-convex optimization problem with a spurious local
optimum. We show, however, that gradient descent combined with proper
normalization, avoids being trapped by the spurious local optimum, and
converges to a global optimum in polynomial time, when the weight of the first
layer is initialized at 0, and that of the second layer is initialized
arbitrarily in a ball. Numerical experiments are provided to support our
theory.</abstract><doi>10.48550/arxiv.1909.04653</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.1909.04653 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_1909_04653 |
source | arXiv.org |
subjects | Computer Science - Learning Mathematics - Optimization and Control Statistics - Machine Learning |
title | Towards Understanding the Importance of Shortcut Connections in Residual Networks |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T17%3A52%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Towards%20Understanding%20the%20Importance%20of%20Shortcut%20Connections%20in%20Residual%20Networks&rft.au=Liu,%20Tianyi&rft.date=2019-09-10&rft_id=info:doi/10.48550/arxiv.1909.04653&rft_dat=%3Carxiv_GOX%3E1909_04653%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |