Training neural networks using Metropolis Monte Carlo and an adaptive variant

We examine the zero-temperature Metropolis Monte Carlo (MC) algorithm as a tool for training a neural network by minimizing a loss function. We find that, as expected on theoretical grounds and shown empirically by other authors, Metropolis MC can train a neural net with an accuracy comparable to th...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Machine learning: science and technology 2022-12, Vol.3 (4), p.45026
Hauptverfasser:	Whitelam, Stephen, Selin, Viktor, Benlolo, Ian, Casert, Corneel, Tamblyn, Isaac
Format:	Artikel
Sprache:	eng
Schlagworte:	adaptive Adaptive algorithms Algorithms Artificial neural networks Computer architecture gradients Metropolis Monte Carlo Neural networks Numerical stability optimization Recurrent neural networks Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	4
container_start_page	45026
container_title	Machine learning: science and technology
container_volume	3
creator	Whitelam, Stephen Selin, Viktor Benlolo, Ian Casert, Corneel Tamblyn, Isaac
description	We examine the zero-temperature Metropolis Monte Carlo (MC) algorithm as a tool for training a neural network by minimizing a loss function. We find that, as expected on theoretical grounds and shown empirically by other authors, Metropolis MC can train a neural net with an accuracy comparable to that of gradient descent (GD), if not necessarily as quickly. The Metropolis algorithm does not fail automatically when the number of parameters of a neural network is large. It can fail when a neural network’s structure or neuron activations are strongly heterogenous, and we introduce an adaptive Monte Carlo algorithm (aMC) to overcome these limitations. The intrinsic stochasticity and numerical stability of the MC method allow aMC to train deep neural networks and recurrent neural networks in which the gradient is too small or too large to allow training by GD. MC methods offer a complement to gradient-based methods for training neural networks, allowing access to a distinct set of network architectures and principles.
doi_str_mv	10.1088/2632-2153/aca6cd
format	Article
fullrecord	<record><control><sourceid>proquest_iop_j</sourceid><recordid>TN_cdi_iop_journals_10_1088_2632_2153_aca6cd</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2755903333</sourcerecordid><originalsourceid>FETCH-LOGICAL-c409t-a269de6eb4b66406a5a8144f846d3d7db26190c32d414b82cbbbafca799f77933</originalsourceid><addsrcrecordid>eNp9kM1LxDAQxYMouKx791j04MV10yRN26MsfsEuXtZzmCapZu0mNUlX_O9tqagHcWCYYfi9x_AQOk3xVYqLYkE4JXOSZnQBErhUB2jyfTr8tR-jWQhbjDHJUpoRPEHrjQdjjX1OrO48NP2I786_hqQLw3Wto3eta0xI1s5GnSzBNy4Bq_pOQEEbzV4ne_AGbDxBRzU0Qc--5hQ93d5slvfz1ePdw_J6NZcMl3EOhJdKc12xinOGOWRQpIzVBeOKqlxVhKcllpQolrKqILKqKqgl5GVZ53lJ6RSdjb4uRCOCNFHLF-ms1TKKXopJiXvofIRa7946HaLYus7b_i9B8izriaGmCI-U9C4Er2vRerMD_yFSLIZwxZCeGNITY7i95HKUGNf-eP6DX_yB75r-ISqYwCzDhItW1fQTnJiISA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2755903333</pqid></control><display><type>article</type><title>Training neural networks using Metropolis Monte Carlo and an adaptive variant</title><source>IOP Publishing Free Content</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Whitelam, Stephen ; Selin, Viktor ; Benlolo, Ian ; Casert, Corneel ; Tamblyn, Isaac</creator><creatorcontrib>Whitelam, Stephen ; Selin, Viktor ; Benlolo, Ian ; Casert, Corneel ; Tamblyn, Isaac</creatorcontrib><description>We examine the zero-temperature Metropolis Monte Carlo (MC) algorithm as a tool for training a neural network by minimizing a loss function. We find that, as expected on theoretical grounds and shown empirically by other authors, Metropolis MC can train a neural net with an accuracy comparable to that of gradient descent (GD), if not necessarily as quickly. The Metropolis algorithm does not fail automatically when the number of parameters of a neural network is large. It can fail when a neural network’s structure or neuron activations are strongly heterogenous, and we introduce an adaptive Monte Carlo algorithm (aMC) to overcome these limitations. The intrinsic stochasticity and numerical stability of the MC method allow aMC to train deep neural networks and recurrent neural networks in which the gradient is too small or too large to allow training by GD. MC methods offer a complement to gradient-based methods for training neural networks, allowing access to a distinct set of network architectures and principles.</description><identifier>ISSN: 2632-2153</identifier><identifier>EISSN: 2632-2153</identifier><identifier>DOI: 10.1088/2632-2153/aca6cd</identifier><identifier>CODEN: MLSTCK</identifier><language>eng</language><publisher>Bristol: IOP Publishing</publisher><subject>adaptive ; Adaptive algorithms ; Algorithms ; Artificial neural networks ; Computer architecture ; gradients ; Metropolis Monte Carlo ; Neural networks ; Numerical stability ; optimization ; Recurrent neural networks ; Training</subject><ispartof>Machine learning: science and technology, 2022-12, Vol.3 (4), p.45026</ispartof><rights>2022 The Author(s). Published by IOP Publishing Ltd</rights><rights>2022 The Author(s). Published by IOP Publishing Ltd. This work is published under http://creativecommons.org/licenses/by/4.0 (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c409t-a269de6eb4b66406a5a8144f846d3d7db26190c32d414b82cbbbafca799f77933</citedby><cites>FETCH-LOGICAL-c409t-a269de6eb4b66406a5a8144f846d3d7db26190c32d414b82cbbbafca799f77933</cites><orcidid>0000-0002-8146-6667 ; 0000000281466667</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://iopscience.iop.org/article/10.1088/2632-2153/aca6cd/pdf$$EPDF$$P50$$Giop$$Hfree_for_read</linktopdf><link.rule.ids>230,314,776,780,860,881,27901,27902,38867,53842</link.rule.ids><backlink>$$Uhttps://www.osti.gov/biblio/1900290$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Whitelam, Stephen</creatorcontrib><creatorcontrib>Selin, Viktor</creatorcontrib><creatorcontrib>Benlolo, Ian</creatorcontrib><creatorcontrib>Casert, Corneel</creatorcontrib><creatorcontrib>Tamblyn, Isaac</creatorcontrib><title>Training neural networks using Metropolis Monte Carlo and an adaptive variant</title><title>Machine learning: science and technology</title><addtitle>MLST</addtitle><addtitle>Mach. Learn.: Sci. Technol</addtitle><description>We examine the zero-temperature Metropolis Monte Carlo (MC) algorithm as a tool for training a neural network by minimizing a loss function. We find that, as expected on theoretical grounds and shown empirically by other authors, Metropolis MC can train a neural net with an accuracy comparable to that of gradient descent (GD), if not necessarily as quickly. The Metropolis algorithm does not fail automatically when the number of parameters of a neural network is large. It can fail when a neural network’s structure or neuron activations are strongly heterogenous, and we introduce an adaptive Monte Carlo algorithm (aMC) to overcome these limitations. The intrinsic stochasticity and numerical stability of the MC method allow aMC to train deep neural networks and recurrent neural networks in which the gradient is too small or too large to allow training by GD. MC methods offer a complement to gradient-based methods for training neural networks, allowing access to a distinct set of network architectures and principles.</description><subject>adaptive</subject><subject>Adaptive algorithms</subject><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>Computer architecture</subject><subject>gradients</subject><subject>Metropolis Monte Carlo</subject><subject>Neural networks</subject><subject>Numerical stability</subject><subject>optimization</subject><subject>Recurrent neural networks</subject><subject>Training</subject><issn>2632-2153</issn><issn>2632-2153</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>O3W</sourceid><sourceid>BENPR</sourceid><recordid>eNp9kM1LxDAQxYMouKx791j04MV10yRN26MsfsEuXtZzmCapZu0mNUlX_O9tqagHcWCYYfi9x_AQOk3xVYqLYkE4JXOSZnQBErhUB2jyfTr8tR-jWQhbjDHJUpoRPEHrjQdjjX1OrO48NP2I786_hqQLw3Wto3eta0xI1s5GnSzBNy4Bq_pOQEEbzV4ne_AGbDxBRzU0Qc--5hQ93d5slvfz1ePdw_J6NZcMl3EOhJdKc12xinOGOWRQpIzVBeOKqlxVhKcllpQolrKqILKqKqgl5GVZ53lJ6RSdjb4uRCOCNFHLF-ms1TKKXopJiXvofIRa7946HaLYus7b_i9B8izriaGmCI-U9C4Er2vRerMD_yFSLIZwxZCeGNITY7i95HKUGNf-eP6DX_yB75r-ISqYwCzDhItW1fQTnJiISA</recordid><startdate>20221201</startdate><enddate>20221201</enddate><creator>Whitelam, Stephen</creator><creator>Selin, Viktor</creator><creator>Benlolo, Ian</creator><creator>Casert, Corneel</creator><creator>Tamblyn, Isaac</creator><general>IOP Publishing</general><scope>O3W</scope><scope>TSCCA</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7XB</scope><scope>88I</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>M2P</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>OTOTI</scope><orcidid>https://orcid.org/0000-0002-8146-6667</orcidid><orcidid>https://orcid.org/0000000281466667</orcidid></search><sort><creationdate>20221201</creationdate><title>Training neural networks using Metropolis Monte Carlo and an adaptive variant</title><author>Whitelam, Stephen ; Selin, Viktor ; Benlolo, Ian ; Casert, Corneel ; Tamblyn, Isaac</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c409t-a269de6eb4b66406a5a8144f846d3d7db26190c32d414b82cbbbafca799f77933</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>adaptive</topic><topic>Adaptive algorithms</topic><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>Computer architecture</topic><topic>gradients</topic><topic>Metropolis Monte Carlo</topic><topic>Neural networks</topic><topic>Numerical stability</topic><topic>optimization</topic><topic>Recurrent neural networks</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Whitelam, Stephen</creatorcontrib><creatorcontrib>Selin, Viktor</creatorcontrib><creatorcontrib>Benlolo, Ian</creatorcontrib><creatorcontrib>Casert, Corneel</creatorcontrib><creatorcontrib>Tamblyn, Isaac</creatorcontrib><collection>IOP Publishing Free Content</collection><collection>IOPscience (Open Access)</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Science Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Science Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>OSTI.GOV</collection><jtitle>Machine learning: science and technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Whitelam, Stephen</au><au>Selin, Viktor</au><au>Benlolo, Ian</au><au>Casert, Corneel</au><au>Tamblyn, Isaac</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Training neural networks using Metropolis Monte Carlo and an adaptive variant</atitle><jtitle>Machine learning: science and technology</jtitle><stitle>MLST</stitle><addtitle>Mach. Learn.: Sci. Technol</addtitle><date>2022-12-01</date><risdate>2022</risdate><volume>3</volume><issue>4</issue><spage>45026</spage><pages>45026-</pages><issn>2632-2153</issn><eissn>2632-2153</eissn><coden>MLSTCK</coden><abstract>We examine the zero-temperature Metropolis Monte Carlo (MC) algorithm as a tool for training a neural network by minimizing a loss function. We find that, as expected on theoretical grounds and shown empirically by other authors, Metropolis MC can train a neural net with an accuracy comparable to that of gradient descent (GD), if not necessarily as quickly. The Metropolis algorithm does not fail automatically when the number of parameters of a neural network is large. It can fail when a neural network’s structure or neuron activations are strongly heterogenous, and we introduce an adaptive Monte Carlo algorithm (aMC) to overcome these limitations. The intrinsic stochasticity and numerical stability of the MC method allow aMC to train deep neural networks and recurrent neural networks in which the gradient is too small or too large to allow training by GD. MC methods offer a complement to gradient-based methods for training neural networks, allowing access to a distinct set of network architectures and principles.</abstract><cop>Bristol</cop><pub>IOP Publishing</pub><doi>10.1088/2632-2153/aca6cd</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0002-8146-6667</orcidid><orcidid>https://orcid.org/0000000281466667</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2632-2153
ispartof	Machine learning: science and technology, 2022-12, Vol.3 (4), p.45026
issn	2632-2153 2632-2153
language	eng
recordid	cdi_iop_journals_10_1088_2632_2153_aca6cd
source	IOP Publishing Free Content; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects	adaptive Adaptive algorithms Algorithms Artificial neural networks Computer architecture gradients Metropolis Monte Carlo Neural networks Numerical stability optimization Recurrent neural networks Training
title	Training neural networks using Metropolis Monte Carlo and an adaptive variant
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-13T00%3A16%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_iop_j&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Training%20neural%20networks%20using%20Metropolis%20Monte%20Carlo%20and%20an%20adaptive%20variant&rft.jtitle=Machine%20learning:%20science%20and%20technology&rft.au=Whitelam,%20Stephen&rft.date=2022-12-01&rft.volume=3&rft.issue=4&rft.spage=45026&rft.pages=45026-&rft.issn=2632-2153&rft.eissn=2632-2153&rft.coden=MLSTCK&rft_id=info:doi/10.1088/2632-2153/aca6cd&rft_dat=%3Cproquest_iop_j%3E2755903333%3C/proquest_iop_j%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2755903333&rft_id=info:pmid/&rfr_iscdi=true