Typical and atypical solutions in non-convex neural networks with discrete and continuous weights

We study the binary and continuous negative-margin perceptrons as simple non-convex neural network models learning random rules and associations. We analyze the geometry of the landscape of solutions in both models and find important similarities and differences. Both models exhibit subdominant mini...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2023-07
Hauptverfasser: Baldassi, Carlo, Malatesta, Enrico M, Perugini, Gabriele, Zecchina, Riccardo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Baldassi, Carlo
Malatesta, Enrico M
Perugini, Gabriele
Zecchina, Riccardo
description We study the binary and continuous negative-margin perceptrons as simple non-convex neural network models learning random rules and associations. We analyze the geometry of the landscape of solutions in both models and find important similarities and differences. Both models exhibit subdominant minimizers which are extremely flat and wide. These minimizers coexist with a background of dominant solutions which are composed by an exponential number of algorithmically inaccessible small clusters for the binary case (the frozen 1-RSB phase) or a hierarchical structure of clusters of different sizes for the spherical case (the full RSB phase). In both cases, when a certain threshold in constraint density is crossed, the local entropy of the wide flat minima becomes non-monotonic, indicating a break-up of the space of robust solutions into disconnected components. This has a strong impact on the behavior of algorithms in binary models, which cannot access the remaining isolated clusters. For the spherical case the behaviour is different, since even beyond the disappearance of the wide flat minima the remaining solutions are shown to always be surrounded by a large number of other solutions at any distance, up to capacity. Indeed, we exhibit numerical evidence that algorithms seem to find solutions up to the SAT/UNSAT transition, that we compute here using an 1RSB approximation. For both models, the generalization performance as a learning device is shown to be greatly improved by the existence of wide flat minimizers even when trained in the highly underconstrained regime of very negative margins.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2807203971</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2807203971</sourcerecordid><originalsourceid>FETCH-proquest_journals_28072039713</originalsourceid><addsrcrecordid>eNqNjEEKwjAQRYMgWLR3CLgupIm1dS2KB-i-hBptapmpmcTq7Q3SA7j6PN7jL1gilcqzaifliqVEvRBC7ktZFCphuv6MttUD13Dl2s9AOARvEYhb4ICQtQgv8-ZggosajJ_QPYhP1nf8aql1xpvfRQy9hYAhSmPvnacNW970QCadd82251N9vGSjw2cw5Jseg4OoGlmJUgp1KHP1X_UF2sVGfg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2807203971</pqid></control><display><type>article</type><title>Typical and atypical solutions in non-convex neural networks with discrete and continuous weights</title><source>Free E- Journals</source><creator>Baldassi, Carlo ; Malatesta, Enrico M ; Perugini, Gabriele ; Zecchina, Riccardo</creator><creatorcontrib>Baldassi, Carlo ; Malatesta, Enrico M ; Perugini, Gabriele ; Zecchina, Riccardo</creatorcontrib><description>We study the binary and continuous negative-margin perceptrons as simple non-convex neural network models learning random rules and associations. We analyze the geometry of the landscape of solutions in both models and find important similarities and differences. Both models exhibit subdominant minimizers which are extremely flat and wide. These minimizers coexist with a background of dominant solutions which are composed by an exponential number of algorithmically inaccessible small clusters for the binary case (the frozen 1-RSB phase) or a hierarchical structure of clusters of different sizes for the spherical case (the full RSB phase). In both cases, when a certain threshold in constraint density is crossed, the local entropy of the wide flat minima becomes non-monotonic, indicating a break-up of the space of robust solutions into disconnected components. This has a strong impact on the behavior of algorithms in binary models, which cannot access the remaining isolated clusters. For the spherical case the behaviour is different, since even beyond the disappearance of the wide flat minima the remaining solutions are shown to always be surrounded by a large number of other solutions at any distance, up to capacity. Indeed, we exhibit numerical evidence that algorithms seem to find solutions up to the SAT/UNSAT transition, that we compute here using an 1RSB approximation. For both models, the generalization performance as a learning device is shown to be greatly improved by the existence of wide flat minimizers even when trained in the highly underconstrained regime of very negative margins.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Clusters ; Learning ; Minima ; Neural networks ; Robustness (mathematics)</subject><ispartof>arXiv.org, 2023-07</ispartof><rights>2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Baldassi, Carlo</creatorcontrib><creatorcontrib>Malatesta, Enrico M</creatorcontrib><creatorcontrib>Perugini, Gabriele</creatorcontrib><creatorcontrib>Zecchina, Riccardo</creatorcontrib><title>Typical and atypical solutions in non-convex neural networks with discrete and continuous weights</title><title>arXiv.org</title><description>We study the binary and continuous negative-margin perceptrons as simple non-convex neural network models learning random rules and associations. We analyze the geometry of the landscape of solutions in both models and find important similarities and differences. Both models exhibit subdominant minimizers which are extremely flat and wide. These minimizers coexist with a background of dominant solutions which are composed by an exponential number of algorithmically inaccessible small clusters for the binary case (the frozen 1-RSB phase) or a hierarchical structure of clusters of different sizes for the spherical case (the full RSB phase). In both cases, when a certain threshold in constraint density is crossed, the local entropy of the wide flat minima becomes non-monotonic, indicating a break-up of the space of robust solutions into disconnected components. This has a strong impact on the behavior of algorithms in binary models, which cannot access the remaining isolated clusters. For the spherical case the behaviour is different, since even beyond the disappearance of the wide flat minima the remaining solutions are shown to always be surrounded by a large number of other solutions at any distance, up to capacity. Indeed, we exhibit numerical evidence that algorithms seem to find solutions up to the SAT/UNSAT transition, that we compute here using an 1RSB approximation. For both models, the generalization performance as a learning device is shown to be greatly improved by the existence of wide flat minimizers even when trained in the highly underconstrained regime of very negative margins.</description><subject>Algorithms</subject><subject>Clusters</subject><subject>Learning</subject><subject>Minima</subject><subject>Neural networks</subject><subject>Robustness (mathematics)</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjEEKwjAQRYMgWLR3CLgupIm1dS2KB-i-hBptapmpmcTq7Q3SA7j6PN7jL1gilcqzaifliqVEvRBC7ktZFCphuv6MttUD13Dl2s9AOARvEYhb4ICQtQgv8-ZggosajJ_QPYhP1nf8aql1xpvfRQy9hYAhSmPvnacNW970QCadd82251N9vGSjw2cw5Jseg4OoGlmJUgp1KHP1X_UF2sVGfg</recordid><startdate>20230724</startdate><enddate>20230724</enddate><creator>Baldassi, Carlo</creator><creator>Malatesta, Enrico M</creator><creator>Perugini, Gabriele</creator><creator>Zecchina, Riccardo</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20230724</creationdate><title>Typical and atypical solutions in non-convex neural networks with discrete and continuous weights</title><author>Baldassi, Carlo ; Malatesta, Enrico M ; Perugini, Gabriele ; Zecchina, Riccardo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28072039713</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Clusters</topic><topic>Learning</topic><topic>Minima</topic><topic>Neural networks</topic><topic>Robustness (mathematics)</topic><toplevel>online_resources</toplevel><creatorcontrib>Baldassi, Carlo</creatorcontrib><creatorcontrib>Malatesta, Enrico M</creatorcontrib><creatorcontrib>Perugini, Gabriele</creatorcontrib><creatorcontrib>Zecchina, Riccardo</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Baldassi, Carlo</au><au>Malatesta, Enrico M</au><au>Perugini, Gabriele</au><au>Zecchina, Riccardo</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Typical and atypical solutions in non-convex neural networks with discrete and continuous weights</atitle><jtitle>arXiv.org</jtitle><date>2023-07-24</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>We study the binary and continuous negative-margin perceptrons as simple non-convex neural network models learning random rules and associations. We analyze the geometry of the landscape of solutions in both models and find important similarities and differences. Both models exhibit subdominant minimizers which are extremely flat and wide. These minimizers coexist with a background of dominant solutions which are composed by an exponential number of algorithmically inaccessible small clusters for the binary case (the frozen 1-RSB phase) or a hierarchical structure of clusters of different sizes for the spherical case (the full RSB phase). In both cases, when a certain threshold in constraint density is crossed, the local entropy of the wide flat minima becomes non-monotonic, indicating a break-up of the space of robust solutions into disconnected components. This has a strong impact on the behavior of algorithms in binary models, which cannot access the remaining isolated clusters. For the spherical case the behaviour is different, since even beyond the disappearance of the wide flat minima the remaining solutions are shown to always be surrounded by a large number of other solutions at any distance, up to capacity. Indeed, we exhibit numerical evidence that algorithms seem to find solutions up to the SAT/UNSAT transition, that we compute here using an 1RSB approximation. For both models, the generalization performance as a learning device is shown to be greatly improved by the existence of wide flat minimizers even when trained in the highly underconstrained regime of very negative margins.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2023-07
issn 2331-8422
language eng
recordid cdi_proquest_journals_2807203971
source Free E- Journals
subjects Algorithms
Clusters
Learning
Minima
Neural networks
Robustness (mathematics)
title Typical and atypical solutions in non-convex neural networks with discrete and continuous weights
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T21%3A31%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Typical%20and%20atypical%20solutions%20in%20non-convex%20neural%20networks%20with%20discrete%20and%20continuous%20weights&rft.jtitle=arXiv.org&rft.au=Baldassi,%20Carlo&rft.date=2023-07-24&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2807203971%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2807203971&rft_id=info:pmid/&rfr_iscdi=true