Regularization and Reparameterization Avoid Vanishing Gradients in Sigmoid-Type Networks
Deep learning requires several design choices, such as the nodes' activation functions and the widths, types, and arrangements of the layers. One consideration when making these choices is the vanishing-gradient problem, which is the phenomenon of algorithms getting stuck at suboptimal points d...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2021-06 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Ven, Leni Lederer, Johannes |
description | Deep learning requires several design choices, such as the nodes' activation functions and the widths, types, and arrangements of the layers. One consideration when making these choices is the vanishing-gradient problem, which is the phenomenon of algorithms getting stuck at suboptimal points due to small gradients. In this paper, we revisit the vanishing-gradient problem in the context of sigmoid-type activation. We use mathematical arguments to highlight two different sources of the phenomenon, namely large individual parameters and effects across layers, and to illustrate two simple remedies, namely regularization and rescaling. We then demonstrate the effectiveness of the two remedies in practice. In view of the vanishing-gradient problem being a main reason why tanh and other sigmoid-type activation has become much less popular than relu-type activation, our results bring sigmoid-type activation back to the table. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2537859349</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2537859349</sourcerecordid><originalsourceid>FETCH-proquest_journals_25378593493</originalsourceid><addsrcrecordid>eNqNik8LgjAcQEcQJOV3GHQWbHOpx4j-nDqYRDf5gb9spptts6hPn4fo3OnBe29EPMb5IkgixibEt7YOw5AtYyYE98g5w6pvwMg3OKkVBVXSDDsw0KLDn149tCzpCZS0V6kqujNQSlTOUqnoUVbtkIP81SE9oHtqc7MzMr5AY9H_ckrm202-3ged0fcerStq3Rs1pIIJHici5VHK_7s-Fb5Czg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2537859349</pqid></control><display><type>article</type><title>Regularization and Reparameterization Avoid Vanishing Gradients in Sigmoid-Type Networks</title><source>Free E- Journals</source><creator>Ven, Leni ; Lederer, Johannes</creator><creatorcontrib>Ven, Leni ; Lederer, Johannes</creatorcontrib><description>Deep learning requires several design choices, such as the nodes' activation functions and the widths, types, and arrangements of the layers. One consideration when making these choices is the vanishing-gradient problem, which is the phenomenon of algorithms getting stuck at suboptimal points due to small gradients. In this paper, we revisit the vanishing-gradient problem in the context of sigmoid-type activation. We use mathematical arguments to highlight two different sources of the phenomenon, namely large individual parameters and effects across layers, and to illustrate two simple remedies, namely regularization and rescaling. We then demonstrate the effectiveness of the two remedies in practice. In view of the vanishing-gradient problem being a main reason why tanh and other sigmoid-type activation has become much less popular than relu-type activation, our results bring sigmoid-type activation back to the table.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Machine learning ; Mathematical analysis ; Regularization ; Rescaling</subject><ispartof>arXiv.org, 2021-06</ispartof><rights>2021. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Ven, Leni</creatorcontrib><creatorcontrib>Lederer, Johannes</creatorcontrib><title>Regularization and Reparameterization Avoid Vanishing Gradients in Sigmoid-Type Networks</title><title>arXiv.org</title><description>Deep learning requires several design choices, such as the nodes' activation functions and the widths, types, and arrangements of the layers. One consideration when making these choices is the vanishing-gradient problem, which is the phenomenon of algorithms getting stuck at suboptimal points due to small gradients. In this paper, we revisit the vanishing-gradient problem in the context of sigmoid-type activation. We use mathematical arguments to highlight two different sources of the phenomenon, namely large individual parameters and effects across layers, and to illustrate two simple remedies, namely regularization and rescaling. We then demonstrate the effectiveness of the two remedies in practice. In view of the vanishing-gradient problem being a main reason why tanh and other sigmoid-type activation has become much less popular than relu-type activation, our results bring sigmoid-type activation back to the table.</description><subject>Algorithms</subject><subject>Machine learning</subject><subject>Mathematical analysis</subject><subject>Regularization</subject><subject>Rescaling</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNik8LgjAcQEcQJOV3GHQWbHOpx4j-nDqYRDf5gb9spptts6hPn4fo3OnBe29EPMb5IkgixibEt7YOw5AtYyYE98g5w6pvwMg3OKkVBVXSDDsw0KLDn149tCzpCZS0V6kqujNQSlTOUqnoUVbtkIP81SE9oHtqc7MzMr5AY9H_ckrm202-3ged0fcerStq3Rs1pIIJHici5VHK_7s-Fb5Czg</recordid><startdate>20210604</startdate><enddate>20210604</enddate><creator>Ven, Leni</creator><creator>Lederer, Johannes</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20210604</creationdate><title>Regularization and Reparameterization Avoid Vanishing Gradients in Sigmoid-Type Networks</title><author>Ven, Leni ; Lederer, Johannes</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_25378593493</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Machine learning</topic><topic>Mathematical analysis</topic><topic>Regularization</topic><topic>Rescaling</topic><toplevel>online_resources</toplevel><creatorcontrib>Ven, Leni</creatorcontrib><creatorcontrib>Lederer, Johannes</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ven, Leni</au><au>Lederer, Johannes</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Regularization and Reparameterization Avoid Vanishing Gradients in Sigmoid-Type Networks</atitle><jtitle>arXiv.org</jtitle><date>2021-06-04</date><risdate>2021</risdate><eissn>2331-8422</eissn><abstract>Deep learning requires several design choices, such as the nodes' activation functions and the widths, types, and arrangements of the layers. One consideration when making these choices is the vanishing-gradient problem, which is the phenomenon of algorithms getting stuck at suboptimal points due to small gradients. In this paper, we revisit the vanishing-gradient problem in the context of sigmoid-type activation. We use mathematical arguments to highlight two different sources of the phenomenon, namely large individual parameters and effects across layers, and to illustrate two simple remedies, namely regularization and rescaling. We then demonstrate the effectiveness of the two remedies in practice. In view of the vanishing-gradient problem being a main reason why tanh and other sigmoid-type activation has become much less popular than relu-type activation, our results bring sigmoid-type activation back to the table.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2021-06 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2537859349 |
source | Free E- Journals |
subjects | Algorithms Machine learning Mathematical analysis Regularization Rescaling |
title | Regularization and Reparameterization Avoid Vanishing Gradients in Sigmoid-Type Networks |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T05%3A24%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Regularization%20and%20Reparameterization%20Avoid%20Vanishing%20Gradients%20in%20Sigmoid-Type%20Networks&rft.jtitle=arXiv.org&rft.au=Ven,%20Leni&rft.date=2021-06-04&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2537859349%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2537859349&rft_id=info:pmid/&rfr_iscdi=true |