Regularization and Reparameterization Avoid Vanishing Gradients in Sigmoid-Type Networks

Deep learning requires several design choices, such as the nodes' activation functions and the widths, types, and arrangements of the layers. One consideration when making these choices is the vanishing-gradient problem, which is the phenomenon of algorithms getting stuck at suboptimal points d...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2021-06
Hauptverfasser:	Ven, Leni, Lederer, Johannes
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Machine learning Mathematical analysis Regularization Rescaling
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Ven, Leni Lederer, Johannes
description	Deep learning requires several design choices, such as the nodes' activation functions and the widths, types, and arrangements of the layers. One consideration when making these choices is the vanishing-gradient problem, which is the phenomenon of algorithms getting stuck at suboptimal points due to small gradients. In this paper, we revisit the vanishing-gradient problem in the context of sigmoid-type activation. We use mathematical arguments to highlight two different sources of the phenomenon, namely large individual parameters and effects across layers, and to illustrate two simple remedies, namely regularization and rescaling. We then demonstrate the effectiveness of the two remedies in practice. In view of the vanishing-gradient problem being a main reason why tanh and other sigmoid-type activation has become much less popular than relu-type activation, our results bring sigmoid-type activation back to the table.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2537859349</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2537859349</sourcerecordid><originalsourceid>FETCH-proquest_journals_25378593493</originalsourceid><addsrcrecordid>eNqNik8LgjAcQEcQJOV3GHQWbHOpx4j-nDqYRDf5gb9spptts6hPn4fo3OnBe29EPMb5IkgixibEt7YOw5AtYyYE98g5w6pvwMg3OKkVBVXSDDsw0KLDn149tCzpCZS0V6kqujNQSlTOUqnoUVbtkIP81SE9oHtqc7MzMr5AY9H_ckrm202-3ged0fcerStq3Rs1pIIJHici5VHK_7s-Fb5Czg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2537859349</pqid></control><display><type>article</type><title>Regularization and Reparameterization Avoid Vanishing Gradients in Sigmoid-Type Networks</title><source>Free E- Journals</source><creator>Ven, Leni ; Lederer, Johannes</creator><creatorcontrib>Ven, Leni ; Lederer, Johannes</creatorcontrib><description>Deep learning requires several design choices, such as the nodes' activation functions and the widths, types, and arrangements of the layers. One consideration when making these choices is the vanishing-gradient problem, which is the phenomenon of algorithms getting stuck at suboptimal points due to small gradients. In this paper, we revisit the vanishing-gradient problem in the context of sigmoid-type activation. We use mathematical arguments to highlight two different sources of the phenomenon, namely large individual parameters and effects across layers, and to illustrate two simple remedies, namely regularization and rescaling. We then demonstrate the effectiveness of the two remedies in practice. In view of the vanishing-gradient problem being a main reason why tanh and other sigmoid-type activation has become much less popular than relu-type activation, our results bring sigmoid-type activation back to the table.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Machine learning ; Mathematical analysis ; Regularization ; Rescaling</subject><ispartof>arXiv.org, 2021-06</ispartof><rights>2021. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Ven, Leni</creatorcontrib><creatorcontrib>Lederer, Johannes</creatorcontrib><title>Regularization and Reparameterization Avoid Vanishing Gradients in Sigmoid-Type Networks</title><title>arXiv.org</title><description>Deep learning requires several design choices, such as the nodes' activation functions and the widths, types, and arrangements of the layers. One consideration when making these choices is the vanishing-gradient problem, which is the phenomenon of algorithms getting stuck at suboptimal points due to small gradients. In this paper, we revisit the vanishing-gradient problem in the context of sigmoid-type activation. We use mathematical arguments to highlight two different sources of the phenomenon, namely large individual parameters and effects across layers, and to illustrate two simple remedies, namely regularization and rescaling. We then demonstrate the effectiveness of the two remedies in practice. In view of the vanishing-gradient problem being a main reason why tanh and other sigmoid-type activation has become much less popular than relu-type activation, our results bring sigmoid-type activation back to the table.</description><subject>Algorithms</subject><subject>Machine learning</subject><subject>Mathematical analysis</subject><subject>Regularization</subject><subject>Rescaling</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNik8LgjAcQEcQJOV3GHQWbHOpx4j-nDqYRDf5gb9spptts6hPn4fo3OnBe29EPMb5IkgixibEt7YOw5AtYyYE98g5w6pvwMg3OKkVBVXSDDsw0KLDn149tCzpCZS0V6kqujNQSlTOUqnoUVbtkIP81SE9oHtqc7MzMr5AY9H_ckrm202-3ged0fcerStq3Rs1pIIJHici5VHK_7s-Fb5Czg</recordid><startdate>20210604</startdate><enddate>20210604</enddate><creator>Ven, Leni</creator><creator>Lederer, Johannes</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20210604</creationdate><title>Regularization and Reparameterization Avoid Vanishing Gradients in Sigmoid-Type Networks</title><author>Ven, Leni ; Lederer, Johannes</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_25378593493</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Machine learning</topic><topic>Mathematical analysis</topic><topic>Regularization</topic><topic>Rescaling</topic><toplevel>online_resources</toplevel><creatorcontrib>Ven, Leni</creatorcontrib><creatorcontrib>Lederer, Johannes</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ven, Leni</au><au>Lederer, Johannes</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Regularization and Reparameterization Avoid Vanishing Gradients in Sigmoid-Type Networks</atitle><jtitle>arXiv.org</jtitle><date>2021-06-04</date><risdate>2021</risdate><eissn>2331-8422</eissn><abstract>Deep learning requires several design choices, such as the nodes' activation functions and the widths, types, and arrangements of the layers. One consideration when making these choices is the vanishing-gradient problem, which is the phenomenon of algorithms getting stuck at suboptimal points due to small gradients. In this paper, we revisit the vanishing-gradient problem in the context of sigmoid-type activation. We use mathematical arguments to highlight two different sources of the phenomenon, namely large individual parameters and effects across layers, and to illustrate two simple remedies, namely regularization and rescaling. We then demonstrate the effectiveness of the two remedies in practice. In view of the vanishing-gradient problem being a main reason why tanh and other sigmoid-type activation has become much less popular than relu-type activation, our results bring sigmoid-type activation back to the table.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2021-06
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2537859349
source	Free E- Journals
subjects	Algorithms Machine learning Mathematical analysis Regularization Rescaling
title	Regularization and Reparameterization Avoid Vanishing Gradients in Sigmoid-Type Networks
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T05%3A24%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Regularization%20and%20Reparameterization%20Avoid%20Vanishing%20Gradients%20in%20Sigmoid-Type%20Networks&rft.jtitle=arXiv.org&rft.au=Ven,%20Leni&rft.date=2021-06-04&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2537859349%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2537859349&rft_id=info:pmid/&rfr_iscdi=true