Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective

The lottery ticket hypothesis (LTH) has attracted attention because it can explain why over-parameterized models often show high generalization ability. It is known that when we use iterative magnitude pruning (IMP), which is an algorithm to find sparse networks with high generalization ability that...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2022-09
Hauptverfasser: Sakamoto, Keitaro, Sato, Issei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Sakamoto, Keitaro
Sato, Issei
description The lottery ticket hypothesis (LTH) has attracted attention because it can explain why over-parameterized models often show high generalization ability. It is known that when we use iterative magnitude pruning (IMP), which is an algorithm to find sparse networks with high generalization ability that can be trained from the initial weights independently, called winning tickets, the initial large learning rate does not work well in deep neural networks such as ResNet. However, since the initial large learning rate generally helps the optimizer to converge to flatter minima, we hypothesize that the winning tickets have relatively sharp minima, which is considered a disadvantage in terms of generalization ability. In this paper, we confirm this hypothesis and show that the PAC-Bayesian theory can provide an explicit understanding of the relationship between LTH and generalization behavior. On the basis of our experimental findings that flatness is useful for improving accuracy and robustness to label noise and that the distance from the initial weights is deeply involved in winning tickets, we offer the PAC-Bayes bound using a spike-and-slab distribution to analyze winning tickets. Finally, we revisit existing algorithms for finding winning tickets from a PAC-Bayesian perspective and provide new insights into these methods.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2665376645</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2665376645</sourcerecordid><originalsourceid>FETCH-proquest_journals_26653766453</originalsourceid><addsrcrecordid>eNqNjUELgjAYQEcQJOV_GHQWbHOzq0nhoYMH7zLkU2fmbN8M1q_PQz-g04PHg7chAeP8FJ0TxnYkRBziOGYyZULwgBTZpEb_0VNH78Y5sJ5WunmAo4WfjesBNdLWmictszy6KL8KNdGqB7OmJVicoXH6DQeybdWIEP64J8fbtcqLaLbmtQC6ejCLXWdYMykFT6VMBP-v-gJQMzwr</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2665376645</pqid></control><display><type>article</type><title>Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective</title><source>Free E- Journals</source><creator>Sakamoto, Keitaro ; Sato, Issei</creator><creatorcontrib>Sakamoto, Keitaro ; Sato, Issei</creatorcontrib><description>The lottery ticket hypothesis (LTH) has attracted attention because it can explain why over-parameterized models often show high generalization ability. It is known that when we use iterative magnitude pruning (IMP), which is an algorithm to find sparse networks with high generalization ability that can be trained from the initial weights independently, called winning tickets, the initial large learning rate does not work well in deep neural networks such as ResNet. However, since the initial large learning rate generally helps the optimizer to converge to flatter minima, we hypothesize that the winning tickets have relatively sharp minima, which is considered a disadvantage in terms of generalization ability. In this paper, we confirm this hypothesis and show that the PAC-Bayesian theory can provide an explicit understanding of the relationship between LTH and generalization behavior. On the basis of our experimental findings that flatness is useful for improving accuracy and robustness to label noise and that the distance from the initial weights is deeply involved in winning tickets, we offer the PAC-Bayes bound using a spike-and-slab distribution to analyze winning tickets. Finally, we revisit existing algorithms for finding winning tickets from a PAC-Bayesian perspective and provide new insights into these methods.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Artificial neural networks ; Bayesian analysis ; Hypotheses ; Iterative methods ; Machine learning ; Minima</subject><ispartof>arXiv.org, 2022-09</ispartof><rights>2022. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>777,781</link.rule.ids></links><search><creatorcontrib>Sakamoto, Keitaro</creatorcontrib><creatorcontrib>Sato, Issei</creatorcontrib><title>Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective</title><title>arXiv.org</title><description>The lottery ticket hypothesis (LTH) has attracted attention because it can explain why over-parameterized models often show high generalization ability. It is known that when we use iterative magnitude pruning (IMP), which is an algorithm to find sparse networks with high generalization ability that can be trained from the initial weights independently, called winning tickets, the initial large learning rate does not work well in deep neural networks such as ResNet. However, since the initial large learning rate generally helps the optimizer to converge to flatter minima, we hypothesize that the winning tickets have relatively sharp minima, which is considered a disadvantage in terms of generalization ability. In this paper, we confirm this hypothesis and show that the PAC-Bayesian theory can provide an explicit understanding of the relationship between LTH and generalization behavior. On the basis of our experimental findings that flatness is useful for improving accuracy and robustness to label noise and that the distance from the initial weights is deeply involved in winning tickets, we offer the PAC-Bayes bound using a spike-and-slab distribution to analyze winning tickets. Finally, we revisit existing algorithms for finding winning tickets from a PAC-Bayesian perspective and provide new insights into these methods.</description><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>Bayesian analysis</subject><subject>Hypotheses</subject><subject>Iterative methods</subject><subject>Machine learning</subject><subject>Minima</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjUELgjAYQEcQJOV_GHQWbHOzq0nhoYMH7zLkU2fmbN8M1q_PQz-g04PHg7chAeP8FJ0TxnYkRBziOGYyZULwgBTZpEb_0VNH78Y5sJ5WunmAo4WfjesBNdLWmictszy6KL8KNdGqB7OmJVicoXH6DQeybdWIEP64J8fbtcqLaLbmtQC6ejCLXWdYMykFT6VMBP-v-gJQMzwr</recordid><startdate>20220928</startdate><enddate>20220928</enddate><creator>Sakamoto, Keitaro</creator><creator>Sato, Issei</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20220928</creationdate><title>Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective</title><author>Sakamoto, Keitaro ; Sato, Issei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_26653766453</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>Bayesian analysis</topic><topic>Hypotheses</topic><topic>Iterative methods</topic><topic>Machine learning</topic><topic>Minima</topic><toplevel>online_resources</toplevel><creatorcontrib>Sakamoto, Keitaro</creatorcontrib><creatorcontrib>Sato, Issei</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sakamoto, Keitaro</au><au>Sato, Issei</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective</atitle><jtitle>arXiv.org</jtitle><date>2022-09-28</date><risdate>2022</risdate><eissn>2331-8422</eissn><abstract>The lottery ticket hypothesis (LTH) has attracted attention because it can explain why over-parameterized models often show high generalization ability. It is known that when we use iterative magnitude pruning (IMP), which is an algorithm to find sparse networks with high generalization ability that can be trained from the initial weights independently, called winning tickets, the initial large learning rate does not work well in deep neural networks such as ResNet. However, since the initial large learning rate generally helps the optimizer to converge to flatter minima, we hypothesize that the winning tickets have relatively sharp minima, which is considered a disadvantage in terms of generalization ability. In this paper, we confirm this hypothesis and show that the PAC-Bayesian theory can provide an explicit understanding of the relationship between LTH and generalization behavior. On the basis of our experimental findings that flatness is useful for improving accuracy and robustness to label noise and that the distance from the initial weights is deeply involved in winning tickets, we offer the PAC-Bayes bound using a spike-and-slab distribution to analyze winning tickets. Finally, we revisit existing algorithms for finding winning tickets from a PAC-Bayesian perspective and provide new insights into these methods.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2022-09
issn 2331-8422
language eng
recordid cdi_proquest_journals_2665376645
source Free E- Journals
subjects Algorithms
Artificial neural networks
Bayesian analysis
Hypotheses
Iterative methods
Machine learning
Minima
title Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T01%3A05%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Analyzing%20Lottery%20Ticket%20Hypothesis%20from%20PAC-Bayesian%20Theory%20Perspective&rft.jtitle=arXiv.org&rft.au=Sakamoto,%20Keitaro&rft.date=2022-09-28&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2665376645%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2665376645&rft_id=info:pmid/&rfr_iscdi=true