Regularization and Reparameterization Avoid Vanishing Gradients in Sigmoid-Type Networks

Deep learning requires several design choices, such as the nodes' activation functions and the widths, types, and arrangements of the layers. One consideration when making these choices is the vanishing-gradient problem, which is the phenomenon of algorithms getting stuck at suboptimal points d...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2021-06
Hauptverfasser: Ven, Leni, Lederer, Johannes
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Ven, Leni
Lederer, Johannes
description Deep learning requires several design choices, such as the nodes' activation functions and the widths, types, and arrangements of the layers. One consideration when making these choices is the vanishing-gradient problem, which is the phenomenon of algorithms getting stuck at suboptimal points due to small gradients. In this paper, we revisit the vanishing-gradient problem in the context of sigmoid-type activation. We use mathematical arguments to highlight two different sources of the phenomenon, namely large individual parameters and effects across layers, and to illustrate two simple remedies, namely regularization and rescaling. We then demonstrate the effectiveness of the two remedies in practice. In view of the vanishing-gradient problem being a main reason why tanh and other sigmoid-type activation has become much less popular than relu-type activation, our results bring sigmoid-type activation back to the table.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2537859349</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2537859349</sourcerecordid><originalsourceid>FETCH-proquest_journals_25378593493</originalsourceid><addsrcrecordid>eNqNik8LgjAcQEcQJOV3GHQWbHOpx4j-nDqYRDf5gb9spptts6hPn4fo3OnBe29EPMb5IkgixibEt7YOw5AtYyYE98g5w6pvwMg3OKkVBVXSDDsw0KLDn149tCzpCZS0V6kqujNQSlTOUqnoUVbtkIP81SE9oHtqc7MzMr5AY9H_ckrm202-3ged0fcerStq3Rs1pIIJHici5VHK_7s-Fb5Czg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2537859349</pqid></control><display><type>article</type><title>Regularization and Reparameterization Avoid Vanishing Gradients in Sigmoid-Type Networks</title><source>Free E- Journals</source><creator>Ven, Leni ; Lederer, Johannes</creator><creatorcontrib>Ven, Leni ; Lederer, Johannes</creatorcontrib><description>Deep learning requires several design choices, such as the nodes' activation functions and the widths, types, and arrangements of the layers. One consideration when making these choices is the vanishing-gradient problem, which is the phenomenon of algorithms getting stuck at suboptimal points due to small gradients. In this paper, we revisit the vanishing-gradient problem in the context of sigmoid-type activation. We use mathematical arguments to highlight two different sources of the phenomenon, namely large individual parameters and effects across layers, and to illustrate two simple remedies, namely regularization and rescaling. We then demonstrate the effectiveness of the two remedies in practice. In view of the vanishing-gradient problem being a main reason why tanh and other sigmoid-type activation has become much less popular than relu-type activation, our results bring sigmoid-type activation back to the table.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Machine learning ; Mathematical analysis ; Regularization ; Rescaling</subject><ispartof>arXiv.org, 2021-06</ispartof><rights>2021. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Ven, Leni</creatorcontrib><creatorcontrib>Lederer, Johannes</creatorcontrib><title>Regularization and Reparameterization Avoid Vanishing Gradients in Sigmoid-Type Networks</title><title>arXiv.org</title><description>Deep learning requires several design choices, such as the nodes' activation functions and the widths, types, and arrangements of the layers. One consideration when making these choices is the vanishing-gradient problem, which is the phenomenon of algorithms getting stuck at suboptimal points due to small gradients. In this paper, we revisit the vanishing-gradient problem in the context of sigmoid-type activation. We use mathematical arguments to highlight two different sources of the phenomenon, namely large individual parameters and effects across layers, and to illustrate two simple remedies, namely regularization and rescaling. We then demonstrate the effectiveness of the two remedies in practice. In view of the vanishing-gradient problem being a main reason why tanh and other sigmoid-type activation has become much less popular than relu-type activation, our results bring sigmoid-type activation back to the table.</description><subject>Algorithms</subject><subject>Machine learning</subject><subject>Mathematical analysis</subject><subject>Regularization</subject><subject>Rescaling</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNik8LgjAcQEcQJOV3GHQWbHOpx4j-nDqYRDf5gb9spptts6hPn4fo3OnBe29EPMb5IkgixibEt7YOw5AtYyYE98g5w6pvwMg3OKkVBVXSDDsw0KLDn149tCzpCZS0V6kqujNQSlTOUqnoUVbtkIP81SE9oHtqc7MzMr5AY9H_ckrm202-3ged0fcerStq3Rs1pIIJHici5VHK_7s-Fb5Czg</recordid><startdate>20210604</startdate><enddate>20210604</enddate><creator>Ven, Leni</creator><creator>Lederer, Johannes</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20210604</creationdate><title>Regularization and Reparameterization Avoid Vanishing Gradients in Sigmoid-Type Networks</title><author>Ven, Leni ; Lederer, Johannes</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_25378593493</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Machine learning</topic><topic>Mathematical analysis</topic><topic>Regularization</topic><topic>Rescaling</topic><toplevel>online_resources</toplevel><creatorcontrib>Ven, Leni</creatorcontrib><creatorcontrib>Lederer, Johannes</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ven, Leni</au><au>Lederer, Johannes</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Regularization and Reparameterization Avoid Vanishing Gradients in Sigmoid-Type Networks</atitle><jtitle>arXiv.org</jtitle><date>2021-06-04</date><risdate>2021</risdate><eissn>2331-8422</eissn><abstract>Deep learning requires several design choices, such as the nodes' activation functions and the widths, types, and arrangements of the layers. One consideration when making these choices is the vanishing-gradient problem, which is the phenomenon of algorithms getting stuck at suboptimal points due to small gradients. In this paper, we revisit the vanishing-gradient problem in the context of sigmoid-type activation. We use mathematical arguments to highlight two different sources of the phenomenon, namely large individual parameters and effects across layers, and to illustrate two simple remedies, namely regularization and rescaling. We then demonstrate the effectiveness of the two remedies in practice. In view of the vanishing-gradient problem being a main reason why tanh and other sigmoid-type activation has become much less popular than relu-type activation, our results bring sigmoid-type activation back to the table.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2021-06
issn 2331-8422
language eng
recordid cdi_proquest_journals_2537859349
source Free E- Journals
subjects Algorithms
Machine learning
Mathematical analysis
Regularization
Rescaling
title Regularization and Reparameterization Avoid Vanishing Gradients in Sigmoid-Type Networks
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T05%3A24%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Regularization%20and%20Reparameterization%20Avoid%20Vanishing%20Gradients%20in%20Sigmoid-Type%20Networks&rft.jtitle=arXiv.org&rft.au=Ven,%20Leni&rft.date=2021-06-04&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2537859349%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2537859349&rft_id=info:pmid/&rfr_iscdi=true