Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective

The lottery ticket hypothesis (LTH) has attracted attention because it can explain why over-parameterized models often show high generalization ability. It is known that when we use iterative magnitude pruning (IMP), which is an algorithm to find sparse networks with high generalization ability that...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2022-09
Hauptverfasser:	Sakamoto, Keitaro, Sato, Issei
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial neural networks Bayesian analysis Hypotheses Iterative methods Machine learning Minima
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Sakamoto, Keitaro Sato, Issei
description	The lottery ticket hypothesis (LTH) has attracted attention because it can explain why over-parameterized models often show high generalization ability. It is known that when we use iterative magnitude pruning (IMP), which is an algorithm to find sparse networks with high generalization ability that can be trained from the initial weights independently, called winning tickets, the initial large learning rate does not work well in deep neural networks such as ResNet. However, since the initial large learning rate generally helps the optimizer to converge to flatter minima, we hypothesize that the winning tickets have relatively sharp minima, which is considered a disadvantage in terms of generalization ability. In this paper, we confirm this hypothesis and show that the PAC-Bayesian theory can provide an explicit understanding of the relationship between LTH and generalization behavior. On the basis of our experimental findings that flatness is useful for improving accuracy and robustness to label noise and that the distance from the initial weights is deeply involved in winning tickets, we offer the PAC-Bayes bound using a spike-and-slab distribution to analyze winning tickets. Finally, we revisit existing algorithms for finding winning tickets from a PAC-Bayesian perspective and provide new insights into these methods.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2665376645</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2665376645</sourcerecordid><originalsourceid>FETCH-proquest_journals_26653766453</originalsourceid><addsrcrecordid>eNqNjUELgjAYQEcQJOV_GHQWbHOzq0nhoYMH7zLkU2fmbN8M1q_PQz-g04PHg7chAeP8FJ0TxnYkRBziOGYyZULwgBTZpEb_0VNH78Y5sJ5WunmAo4WfjesBNdLWmictszy6KL8KNdGqB7OmJVicoXH6DQeybdWIEP64J8fbtcqLaLbmtQC6ejCLXWdYMykFT6VMBP-v-gJQMzwr</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2665376645</pqid></control><display><type>article</type><title>Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective</title><source>Free E- Journals</source><creator>Sakamoto, Keitaro ; Sato, Issei</creator><creatorcontrib>Sakamoto, Keitaro ; Sato, Issei</creatorcontrib><description>The lottery ticket hypothesis (LTH) has attracted attention because it can explain why over-parameterized models often show high generalization ability. It is known that when we use iterative magnitude pruning (IMP), which is an algorithm to find sparse networks with high generalization ability that can be trained from the initial weights independently, called winning tickets, the initial large learning rate does not work well in deep neural networks such as ResNet. However, since the initial large learning rate generally helps the optimizer to converge to flatter minima, we hypothesize that the winning tickets have relatively sharp minima, which is considered a disadvantage in terms of generalization ability. In this paper, we confirm this hypothesis and show that the PAC-Bayesian theory can provide an explicit understanding of the relationship between LTH and generalization behavior. On the basis of our experimental findings that flatness is useful for improving accuracy and robustness to label noise and that the distance from the initial weights is deeply involved in winning tickets, we offer the PAC-Bayes bound using a spike-and-slab distribution to analyze winning tickets. Finally, we revisit existing algorithms for finding winning tickets from a PAC-Bayesian perspective and provide new insights into these methods.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Artificial neural networks ; Bayesian analysis ; Hypotheses ; Iterative methods ; Machine learning ; Minima</subject><ispartof>arXiv.org, 2022-09</ispartof><rights>2022. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>777,781</link.rule.ids></links><search><creatorcontrib>Sakamoto, Keitaro</creatorcontrib><creatorcontrib>Sato, Issei</creatorcontrib><title>Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective</title><title>arXiv.org</title><description>The lottery ticket hypothesis (LTH) has attracted attention because it can explain why over-parameterized models often show high generalization ability. It is known that when we use iterative magnitude pruning (IMP), which is an algorithm to find sparse networks with high generalization ability that can be trained from the initial weights independently, called winning tickets, the initial large learning rate does not work well in deep neural networks such as ResNet. However, since the initial large learning rate generally helps the optimizer to converge to flatter minima, we hypothesize that the winning tickets have relatively sharp minima, which is considered a disadvantage in terms of generalization ability. In this paper, we confirm this hypothesis and show that the PAC-Bayesian theory can provide an explicit understanding of the relationship between LTH and generalization behavior. On the basis of our experimental findings that flatness is useful for improving accuracy and robustness to label noise and that the distance from the initial weights is deeply involved in winning tickets, we offer the PAC-Bayes bound using a spike-and-slab distribution to analyze winning tickets. Finally, we revisit existing algorithms for finding winning tickets from a PAC-Bayesian perspective and provide new insights into these methods.</description><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>Bayesian analysis</subject><subject>Hypotheses</subject><subject>Iterative methods</subject><subject>Machine learning</subject><subject>Minima</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjUELgjAYQEcQJOV_GHQWbHOzq0nhoYMH7zLkU2fmbN8M1q_PQz-g04PHg7chAeP8FJ0TxnYkRBziOGYyZULwgBTZpEb_0VNH78Y5sJ5WunmAo4WfjesBNdLWmictszy6KL8KNdGqB7OmJVicoXH6DQeybdWIEP64J8fbtcqLaLbmtQC6ejCLXWdYMykFT6VMBP-v-gJQMzwr</recordid><startdate>20220928</startdate><enddate>20220928</enddate><creator>Sakamoto, Keitaro</creator><creator>Sato, Issei</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20220928</creationdate><title>Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective</title><author>Sakamoto, Keitaro ; Sato, Issei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_26653766453</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>Bayesian analysis</topic><topic>Hypotheses</topic><topic>Iterative methods</topic><topic>Machine learning</topic><topic>Minima</topic><toplevel>online_resources</toplevel><creatorcontrib>Sakamoto, Keitaro</creatorcontrib><creatorcontrib>Sato, Issei</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sakamoto, Keitaro</au><au>Sato, Issei</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective</atitle><jtitle>arXiv.org</jtitle><date>2022-09-28</date><risdate>2022</risdate><eissn>2331-8422</eissn><abstract>The lottery ticket hypothesis (LTH) has attracted attention because it can explain why over-parameterized models often show high generalization ability. It is known that when we use iterative magnitude pruning (IMP), which is an algorithm to find sparse networks with high generalization ability that can be trained from the initial weights independently, called winning tickets, the initial large learning rate does not work well in deep neural networks such as ResNet. However, since the initial large learning rate generally helps the optimizer to converge to flatter minima, we hypothesize that the winning tickets have relatively sharp minima, which is considered a disadvantage in terms of generalization ability. In this paper, we confirm this hypothesis and show that the PAC-Bayesian theory can provide an explicit understanding of the relationship between LTH and generalization behavior. On the basis of our experimental findings that flatness is useful for improving accuracy and robustness to label noise and that the distance from the initial weights is deeply involved in winning tickets, we offer the PAC-Bayes bound using a spike-and-slab distribution to analyze winning tickets. Finally, we revisit existing algorithms for finding winning tickets from a PAC-Bayesian perspective and provide new insights into these methods.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2022-09
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2665376645
source	Free E- Journals
subjects	Algorithms Artificial neural networks Bayesian analysis Hypotheses Iterative methods Machine learning Minima
title	Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T01%3A05%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Analyzing%20Lottery%20Ticket%20Hypothesis%20from%20PAC-Bayesian%20Theory%20Perspective&rft.jtitle=arXiv.org&rft.au=Sakamoto,%20Keitaro&rft.date=2022-09-28&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2665376645%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2665376645&rft_id=info:pmid/&rfr_iscdi=true