alpha$ -Stable convergence of heavy-/light-tailed infinitely wide neural networks

We consider infinitely wide multi-layer perceptrons (MLPs) which are limits of standard deep feed-forward neural networks. We assume that, for each layer, the weights of an MLP are initialized with independent and identically distributed (i.i.d.) samples from either a light-tailed (finite-variance)...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Advances in applied probability 2023-12, Vol.55 (4), p.1415-1441
Hauptverfasser:	Jung, Paul, Lee, Hoil, Lee, Jiho, Yang, Hongseok
Format:	Artikel
Sprache:	eng
Schlagworte:	Original Article
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1441
container_issue	4
container_start_page	1415
container_title	Advances in applied probability
container_volume	55
creator	Jung, Paul Lee, Hoil Lee, Jiho Yang, Hongseok
description	We consider infinitely wide multi-layer perceptrons (MLPs) which are limits of standard deep feed-forward neural networks. We assume that, for each layer, the weights of an MLP are initialized with independent and identically distributed (i.i.d.) samples from either a light-tailed (finite-variance) or a heavy-tailed distribution in the domain of attraction of a symmetric $\alpha$ -stable distribution, where $\alpha\in(0,2]$ may depend on the layer. For the bias terms of the layer, we assume i.i.d. initializations with a symmetric $\alpha$ -stable distribution having the same $\alpha$ parameter as that layer. Non-stable heavy-tailed weight distributions are important since they have been empirically seen to emerge in trained deep neural nets such as the ResNet and VGG series, and proven to naturally arise via stochastic gradient descent. The introduction of heavy-tailed weights broadens the class of priors in Bayesian neural networks. In this work we extend a recent result of Favaro, Fortini, and Peluchetti (2020) to show that the vector of pre-activation values at all nodes of a given hidden layer converges in the limit, under a suitable scaling, to a vector of i.i.d. random variables with symmetric $\alpha$ -stable distributions, $\alpha\in(0,2]$ .
doi_str_mv	10.1017/apr.2023.3
format	Article
fullrecord	<record><control><sourceid>cambridge</sourceid><recordid>TN_cdi_cambridge_journals_10_1017_apr_2023_3</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><cupid>10_1017_apr_2023_3</cupid><sourcerecordid>10_1017_apr_2023_3</sourcerecordid><originalsourceid>FETCH-LOGICAL-c192t-d44d52461cb31907edec361565902a89100e6eb9eb92d504af1aff59f5ed91893</originalsourceid><addsrcrecordid>eNotkF1LwzAYhYMoWKc3_oJceGm6922atLmU4RcMRNTrkjZvtszYjrbb2L-3Q-HAw7k5Bx7GbhFSBCzmdtunGWQylWcswbxQQoPOz1kCAChKXZSX7GoYNlOVRQkJe7dxu7Z3XHyMto7Em67dU7-itiHeeb4muz-KeQyr9ShGGyI5Hlof2jBSPPJDcMRb2vU2ThgPXf89XLMLb-NAN_-csa-nx8_Fi1i-Pb8uHpaiQZONwuW5U1musaklGijIUSM1Kq0MZLY0CECaajMlcwpy69F6r4xX5AyWRs7Y_d9uY3_qPrgVVZtu17fTZ4VQnXRUk47qpKOS8he9HlNq</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>alpha$ -Stable convergence of heavy-/light-tailed infinitely wide neural networks</title><source>Cambridge Journals</source><creator>Jung, Paul ; Lee, Hoil ; Lee, Jiho ; Yang, Hongseok</creator><creatorcontrib>Jung, Paul ; Lee, Hoil ; Lee, Jiho ; Yang, Hongseok</creatorcontrib><description>We consider infinitely wide multi-layer perceptrons (MLPs) which are limits of standard deep feed-forward neural networks. We assume that, for each layer, the weights of an MLP are initialized with independent and identically distributed (i.i.d.) samples from either a light-tailed (finite-variance) or a heavy-tailed distribution in the domain of attraction of a symmetric $\alpha$ -stable distribution, where $\alpha\in(0,2]$ may depend on the layer. For the bias terms of the layer, we assume i.i.d. initializations with a symmetric $\alpha$ -stable distribution having the same $\alpha$ parameter as that layer. Non-stable heavy-tailed weight distributions are important since they have been empirically seen to emerge in trained deep neural nets such as the ResNet and VGG series, and proven to naturally arise via stochastic gradient descent. The introduction of heavy-tailed weights broadens the class of priors in Bayesian neural networks. In this work we extend a recent result of Favaro, Fortini, and Peluchetti (2020) to show that the vector of pre-activation values at all nodes of a given hidden layer converges in the limit, under a suitable scaling, to a vector of i.i.d. random variables with symmetric $\alpha$ -stable distributions, $\alpha\in(0,2]$ .</description><identifier>ISSN: 0001-8678</identifier><identifier>EISSN: 1475-6064</identifier><identifier>DOI: 10.1017/apr.2023.3</identifier><language>eng</language><publisher>Cambridge, UK: Cambridge University Press</publisher><subject>Original Article</subject><ispartof>Advances in applied probability, 2023-12, Vol.55 (4), p.1415-1441</ispartof><rights>The Author(s), 2023. Published by Cambridge University Press on behalf of Applied Probability Trust</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0003-2786-0441</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.cambridge.org/core/product/identifier/S0001867823000034/type/journal_article$$EHTML$$P50$$Gcambridge$$H</linktohtml><link.rule.ids>164,314,776,780,27901,27902,55603</link.rule.ids></links><search><creatorcontrib>Jung, Paul</creatorcontrib><creatorcontrib>Lee, Hoil</creatorcontrib><creatorcontrib>Lee, Jiho</creatorcontrib><creatorcontrib>Yang, Hongseok</creatorcontrib><title>alpha$ -Stable convergence of heavy-/light-tailed infinitely wide neural networks</title><title>Advances in applied probability</title><addtitle>Adv. Appl. Probab</addtitle><description>We consider infinitely wide multi-layer perceptrons (MLPs) which are limits of standard deep feed-forward neural networks. We assume that, for each layer, the weights of an MLP are initialized with independent and identically distributed (i.i.d.) samples from either a light-tailed (finite-variance) or a heavy-tailed distribution in the domain of attraction of a symmetric $\alpha$ -stable distribution, where $\alpha\in(0,2]$ may depend on the layer. For the bias terms of the layer, we assume i.i.d. initializations with a symmetric $\alpha$ -stable distribution having the same $\alpha$ parameter as that layer. Non-stable heavy-tailed weight distributions are important since they have been empirically seen to emerge in trained deep neural nets such as the ResNet and VGG series, and proven to naturally arise via stochastic gradient descent. The introduction of heavy-tailed weights broadens the class of priors in Bayesian neural networks. In this work we extend a recent result of Favaro, Fortini, and Peluchetti (2020) to show that the vector of pre-activation values at all nodes of a given hidden layer converges in the limit, under a suitable scaling, to a vector of i.i.d. random variables with symmetric $\alpha$ -stable distributions, $\alpha\in(0,2]$ .</description><subject>Original Article</subject><issn>0001-8678</issn><issn>1475-6064</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid/><recordid>eNotkF1LwzAYhYMoWKc3_oJceGm6922atLmU4RcMRNTrkjZvtszYjrbb2L-3Q-HAw7k5Bx7GbhFSBCzmdtunGWQylWcswbxQQoPOz1kCAChKXZSX7GoYNlOVRQkJe7dxu7Z3XHyMto7Em67dU7-itiHeeb4muz-KeQyr9ShGGyI5Hlof2jBSPPJDcMRb2vU2ThgPXf89XLMLb-NAN_-csa-nx8_Fi1i-Pb8uHpaiQZONwuW5U1musaklGijIUSM1Kq0MZLY0CECaajMlcwpy69F6r4xX5AyWRs7Y_d9uY3_qPrgVVZtu17fTZ4VQnXRUk47qpKOS8he9HlNq</recordid><startdate>20231201</startdate><enddate>20231201</enddate><creator>Jung, Paul</creator><creator>Lee, Hoil</creator><creator>Lee, Jiho</creator><creator>Yang, Hongseok</creator><general>Cambridge University Press</general><scope/><orcidid>https://orcid.org/0000-0003-2786-0441</orcidid></search><sort><creationdate>20231201</creationdate><title>alpha$ -Stable convergence of heavy-/light-tailed infinitely wide neural networks</title><author>Jung, Paul ; Lee, Hoil ; Lee, Jiho ; Yang, Hongseok</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c192t-d44d52461cb31907edec361565902a89100e6eb9eb92d504af1aff59f5ed91893</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Original Article</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jung, Paul</creatorcontrib><creatorcontrib>Lee, Hoil</creatorcontrib><creatorcontrib>Lee, Jiho</creatorcontrib><creatorcontrib>Yang, Hongseok</creatorcontrib><jtitle>Advances in applied probability</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jung, Paul</au><au>Lee, Hoil</au><au>Lee, Jiho</au><au>Yang, Hongseok</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>alpha$ -Stable convergence of heavy-/light-tailed infinitely wide neural networks</atitle><jtitle>Advances in applied probability</jtitle><addtitle>Adv. Appl. Probab</addtitle><date>2023-12-01</date><risdate>2023</risdate><volume>55</volume><issue>4</issue><spage>1415</spage><epage>1441</epage><pages>1415-1441</pages><issn>0001-8678</issn><eissn>1475-6064</eissn><abstract>We consider infinitely wide multi-layer perceptrons (MLPs) which are limits of standard deep feed-forward neural networks. We assume that, for each layer, the weights of an MLP are initialized with independent and identically distributed (i.i.d.) samples from either a light-tailed (finite-variance) or a heavy-tailed distribution in the domain of attraction of a symmetric $\alpha$ -stable distribution, where $\alpha\in(0,2]$ may depend on the layer. For the bias terms of the layer, we assume i.i.d. initializations with a symmetric $\alpha$ -stable distribution having the same $\alpha$ parameter as that layer. Non-stable heavy-tailed weight distributions are important since they have been empirically seen to emerge in trained deep neural nets such as the ResNet and VGG series, and proven to naturally arise via stochastic gradient descent. The introduction of heavy-tailed weights broadens the class of priors in Bayesian neural networks. In this work we extend a recent result of Favaro, Fortini, and Peluchetti (2020) to show that the vector of pre-activation values at all nodes of a given hidden layer converges in the limit, under a suitable scaling, to a vector of i.i.d. random variables with symmetric $\alpha$ -stable distributions, $\alpha\in(0,2]$ .</abstract><cop>Cambridge, UK</cop><pub>Cambridge University Press</pub><doi>10.1017/apr.2023.3</doi><tpages>27</tpages><orcidid>https://orcid.org/0000-0003-2786-0441</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0001-8678
ispartof	Advances in applied probability, 2023-12, Vol.55 (4), p.1415-1441
issn	0001-8678 1475-6064
language	eng
recordid	cdi_cambridge_journals_10_1017_apr_2023_3
source	Cambridge Journals
subjects	Original Article
title	alpha$ -Stable convergence of heavy-/light-tailed infinitely wide neural networks
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-13T11%3A16%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-cambridge&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=alpha$%20-Stable%20convergence%20of%20heavy-/light-tailed%20infinitely%20wide%20neural%20networks&rft.jtitle=Advances%20in%20applied%20probability&rft.au=Jung,%20Paul&rft.date=2023-12-01&rft.volume=55&rft.issue=4&rft.spage=1415&rft.epage=1441&rft.pages=1415-1441&rft.issn=0001-8678&rft.eissn=1475-6064&rft_id=info:doi/10.1017/apr.2023.3&rft_dat=%3Ccambridge%3E10_1017_apr_2023_3%3C/cambridge%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_cupid=10_1017_apr_2023_3&rfr_iscdi=true