Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes

In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost function. It is not well understood how learning occurs in these systems, in par...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings of the National Academy of Sciences - PNAS 2016-11, Vol.113 (48), p.E7655-E7662
Hauptverfasser:	Baldassi, Carlo, Borgs, Christian, Chayes, Jennifer T., Ingrosso, Alessandro, Lucibello, Carlo, Saglietti, Luca, Zecchina, Riccardo
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Heuristic Markov analysis Markov chains Neural networks Optimization algorithms Physical Sciences PNAS Plus
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	E7662
container_issue	48
container_start_page	E7655
container_title	Proceedings of the National Academy of Sciences - PNAS
container_volume	113
creator	Baldassi, Carlo Borgs, Christian Chayes, Jennifer T. Ingrosso, Alessandro Lucibello, Carlo Saglietti, Luca Zecchina, Riccardo
description	In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here, we study the difficult case of networks with discrete weights, where the optimization landscape is very rough even for simple architectures, and provide theoretical and numerical evidence of the existence of rare—but extremely dense and accessible—regions of configurations in the network weight space. We define a measure, the robust ensemble (RE), which suppresses trapping by isolated configurations and amplifies the role of these dense regions. We analytically compute the RE in some exactly solvable models and also provide a general algorithmic scheme that is straightforward to implement: define a cost function given by a sum of a finite number of replicas of the original cost function, with a constraint centering the replicas around a driving assignment. To illustrate this, we derive several powerful algorithms, ranging from Markov Chains to message passing to gradient descent processes, where the algorithms target the robust dense states, resulting in substantial improvements in performance. The weak dependence on the number of precision bits of the weights leads us to conjecture that very similar reasoning applies to more conventional neural networks. Analogous algorithmic schemes can also be applied to other optimization problems.
doi_str_mv	10.1073/pnas.1608103113
format	Article
fullrecord	<record><control><sourceid>jstor_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5137727</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>26472702</jstor_id><sourcerecordid>26472702</sourcerecordid><originalsourceid>FETCH-LOGICAL-c542t-48c0e52fb74b1751ae01edd2cfac1b9d4644a9931072263e435b7c51a444464e3</originalsourceid><addsrcrecordid>eNqFkTtPwzAUhS0EouUxM4EqsbAErh-xkwUJIV4SEgudLce9KSmpXewExL_HVctzwcsd7ueje84h5IDCKQXFzxbOxFMqoaDAKeUbZEihpJkUJWySIQBTWSGYGJCdGGcAUOYFbJMBU0UulciHRI5dQBO9M1WLI6xrtF3zig5jHPl61KIJrnHTkcM-mDaN7s2H57hHtmrTRtxfz10yvr56vLzN7h9u7i4v7jObC9ZlorCAOasrJSqqcmoQKE4mzNbG0qqcCCmEKUuevDAmOQqeV8omTqQnBfJdcr7SXfTVHCcWXZfO0IvQzE141940-vfGNU966l91TrlSTCWBk7VA8C89xk7Pm2ixbY1D30dNC1lwLpbh_Y8KqkpWSEjo8R905vvgUhJLSqZomWKJOltRNvgYA9Zfd1PQy_r0sj79XV_6cfTT7hf_2VcCDlfALHY-fO-lSGaB8Q_3Zp80</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1846674272</pqid></control><display><type>article</type><title>Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes</title><source>Jstor Complete Legacy</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><source>Free Full-Text Journals in Chemistry</source><creator>Baldassi, Carlo ; Borgs, Christian ; Chayes, Jennifer T. ; Ingrosso, Alessandro ; Lucibello, Carlo ; Saglietti, Luca ; Zecchina, Riccardo</creator><creatorcontrib>Baldassi, Carlo ; Borgs, Christian ; Chayes, Jennifer T. ; Ingrosso, Alessandro ; Lucibello, Carlo ; Saglietti, Luca ; Zecchina, Riccardo</creatorcontrib><description>In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here, we study the difficult case of networks with discrete weights, where the optimization landscape is very rough even for simple architectures, and provide theoretical and numerical evidence of the existence of rare—but extremely dense and accessible—regions of configurations in the network weight space. We define a measure, the robust ensemble (RE), which suppresses trapping by isolated configurations and amplifies the role of these dense regions. We analytically compute the RE in some exactly solvable models and also provide a general algorithmic scheme that is straightforward to implement: define a cost function given by a sum of a finite number of replicas of the original cost function, with a constraint centering the replicas around a driving assignment. To illustrate this, we derive several powerful algorithms, ranging from Markov Chains to message passing to gradient descent processes, where the algorithms target the robust dense states, resulting in substantial improvements in performance. The weak dependence on the number of precision bits of the weights leads us to conjecture that very similar reasoning applies to more conventional neural networks. Analogous algorithmic schemes can also be applied to other optimization problems.</description><identifier>ISSN: 0027-8424</identifier><identifier>EISSN: 1091-6490</identifier><identifier>DOI: 10.1073/pnas.1608103113</identifier><identifier>PMID: 27856745</identifier><language>eng</language><publisher>United States: National Academy of Sciences</publisher><subject>Algorithms ; Heuristic ; Markov analysis ; Markov chains ; Neural networks ; Optimization algorithms ; Physical Sciences ; PNAS Plus</subject><ispartof>Proceedings of the National Academy of Sciences - PNAS, 2016-11, Vol.113 (48), p.E7655-E7662</ispartof><rights>Volumes 1–89 and 106–113, copyright as a collective work only; author(s) retains copyright to individual articles</rights><rights>Copyright National Academy of Sciences Nov 29, 2016</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c542t-48c0e52fb74b1751ae01edd2cfac1b9d4644a9931072263e435b7c51a444464e3</citedby><cites>FETCH-LOGICAL-c542t-48c0e52fb74b1751ae01edd2cfac1b9d4644a9931072263e435b7c51a444464e3</cites><orcidid>0000-0002-5451-8388</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/26472702$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/26472702$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>230,314,724,777,781,800,882,27905,27906,53772,53774,57998,58231</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/27856745$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Baldassi, Carlo</creatorcontrib><creatorcontrib>Borgs, Christian</creatorcontrib><creatorcontrib>Chayes, Jennifer T.</creatorcontrib><creatorcontrib>Ingrosso, Alessandro</creatorcontrib><creatorcontrib>Lucibello, Carlo</creatorcontrib><creatorcontrib>Saglietti, Luca</creatorcontrib><creatorcontrib>Zecchina, Riccardo</creatorcontrib><title>Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes</title><title>Proceedings of the National Academy of Sciences - PNAS</title><addtitle>Proc Natl Acad Sci U S A</addtitle><description>In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here, we study the difficult case of networks with discrete weights, where the optimization landscape is very rough even for simple architectures, and provide theoretical and numerical evidence of the existence of rare—but extremely dense and accessible—regions of configurations in the network weight space. We define a measure, the robust ensemble (RE), which suppresses trapping by isolated configurations and amplifies the role of these dense regions. We analytically compute the RE in some exactly solvable models and also provide a general algorithmic scheme that is straightforward to implement: define a cost function given by a sum of a finite number of replicas of the original cost function, with a constraint centering the replicas around a driving assignment. To illustrate this, we derive several powerful algorithms, ranging from Markov Chains to message passing to gradient descent processes, where the algorithms target the robust dense states, resulting in substantial improvements in performance. The weak dependence on the number of precision bits of the weights leads us to conjecture that very similar reasoning applies to more conventional neural networks. Analogous algorithmic schemes can also be applied to other optimization problems.</description><subject>Algorithms</subject><subject>Heuristic</subject><subject>Markov analysis</subject><subject>Markov chains</subject><subject>Neural networks</subject><subject>Optimization algorithms</subject><subject>Physical Sciences</subject><subject>PNAS Plus</subject><issn>0027-8424</issn><issn>1091-6490</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNqFkTtPwzAUhS0EouUxM4EqsbAErh-xkwUJIV4SEgudLce9KSmpXewExL_HVctzwcsd7ueje84h5IDCKQXFzxbOxFMqoaDAKeUbZEihpJkUJWySIQBTWSGYGJCdGGcAUOYFbJMBU0UulciHRI5dQBO9M1WLI6xrtF3zig5jHPl61KIJrnHTkcM-mDaN7s2H57hHtmrTRtxfz10yvr56vLzN7h9u7i4v7jObC9ZlorCAOasrJSqqcmoQKE4mzNbG0qqcCCmEKUuevDAmOQqeV8omTqQnBfJdcr7SXfTVHCcWXZfO0IvQzE141940-vfGNU966l91TrlSTCWBk7VA8C89xk7Pm2ixbY1D30dNC1lwLpbh_Y8KqkpWSEjo8R905vvgUhJLSqZomWKJOltRNvgYA9Zfd1PQy_r0sj79XV_6cfTT7hf_2VcCDlfALHY-fO-lSGaB8Q_3Zp80</recordid><startdate>20161129</startdate><enddate>20161129</enddate><creator>Baldassi, Carlo</creator><creator>Borgs, Christian</creator><creator>Chayes, Jennifer T.</creator><creator>Ingrosso, Alessandro</creator><creator>Lucibello, Carlo</creator><creator>Saglietti, Luca</creator><creator>Zecchina, Riccardo</creator><general>National Academy of Sciences</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QG</scope><scope>7QL</scope><scope>7QP</scope><scope>7QR</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TK</scope><scope>7TM</scope><scope>7TO</scope><scope>7U9</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>H94</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>7QO</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-5451-8388</orcidid></search><sort><creationdate>20161129</creationdate><title>Unreasonable effectiveness of learning neural networks</title><author>Baldassi, Carlo ; Borgs, Christian ; Chayes, Jennifer T. ; Ingrosso, Alessandro ; Lucibello, Carlo ; Saglietti, Luca ; Zecchina, Riccardo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c542t-48c0e52fb74b1751ae01edd2cfac1b9d4644a9931072263e435b7c51a444464e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Algorithms</topic><topic>Heuristic</topic><topic>Markov analysis</topic><topic>Markov chains</topic><topic>Neural networks</topic><topic>Optimization algorithms</topic><topic>Physical Sciences</topic><topic>PNAS Plus</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Baldassi, Carlo</creatorcontrib><creatorcontrib>Borgs, Christian</creatorcontrib><creatorcontrib>Chayes, Jennifer T.</creatorcontrib><creatorcontrib>Ingrosso, Alessandro</creatorcontrib><creatorcontrib>Lucibello, Carlo</creatorcontrib><creatorcontrib>Saglietti, Luca</creatorcontrib><creatorcontrib>Zecchina, Riccardo</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Oncogenes and Growth Factors Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>Biotechnology Research Abstracts</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Proceedings of the National Academy of Sciences - PNAS</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Baldassi, Carlo</au><au>Borgs, Christian</au><au>Chayes, Jennifer T.</au><au>Ingrosso, Alessandro</au><au>Lucibello, Carlo</au><au>Saglietti, Luca</au><au>Zecchina, Riccardo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes</atitle><jtitle>Proceedings of the National Academy of Sciences - PNAS</jtitle><addtitle>Proc Natl Acad Sci U S A</addtitle><date>2016-11-29</date><risdate>2016</risdate><volume>113</volume><issue>48</issue><spage>E7655</spage><epage>E7662</epage><pages>E7655-E7662</pages><issn>0027-8424</issn><eissn>1091-6490</eissn><abstract>In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here, we study the difficult case of networks with discrete weights, where the optimization landscape is very rough even for simple architectures, and provide theoretical and numerical evidence of the existence of rare—but extremely dense and accessible—regions of configurations in the network weight space. We define a measure, the robust ensemble (RE), which suppresses trapping by isolated configurations and amplifies the role of these dense regions. We analytically compute the RE in some exactly solvable models and also provide a general algorithmic scheme that is straightforward to implement: define a cost function given by a sum of a finite number of replicas of the original cost function, with a constraint centering the replicas around a driving assignment. To illustrate this, we derive several powerful algorithms, ranging from Markov Chains to message passing to gradient descent processes, where the algorithms target the robust dense states, resulting in substantial improvements in performance. The weak dependence on the number of precision bits of the weights leads us to conjecture that very similar reasoning applies to more conventional neural networks. Analogous algorithmic schemes can also be applied to other optimization problems.</abstract><cop>United States</cop><pub>National Academy of Sciences</pub><pmid>27856745</pmid><doi>10.1073/pnas.1608103113</doi><orcidid>https://orcid.org/0000-0002-5451-8388</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0027-8424
ispartof	Proceedings of the National Academy of Sciences - PNAS, 2016-11, Vol.113 (48), p.E7655-E7662
issn	0027-8424 1091-6490
language	eng
recordid	cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5137727
source	Jstor Complete Legacy; PubMed Central; Alma/SFX Local Collection; Free Full-Text Journals in Chemistry
subjects	Algorithms Heuristic Markov analysis Markov chains Neural networks Optimization algorithms Physical Sciences PNAS Plus
title	Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T12%3A10%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Unreasonable%20effectiveness%20of%20learning%20neural%20networks:%20From%20accessible%20states%20and%20robust%20ensembles%20to%20basic%20algorithmic%20schemes&rft.jtitle=Proceedings%20of%20the%20National%20Academy%20of%20Sciences%20-%20PNAS&rft.au=Baldassi,%20Carlo&rft.date=2016-11-29&rft.volume=113&rft.issue=48&rft.spage=E7655&rft.epage=E7662&rft.pages=E7655-E7662&rft.issn=0027-8424&rft.eissn=1091-6490&rft_id=info:doi/10.1073/pnas.1608103113&rft_dat=%3Cjstor_pubme%3E26472702%3C/jstor_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1846674272&rft_id=info:pmid/27856745&rft_jstor_id=26472702&rfr_iscdi=true