Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective
The lottery ticket hypothesis (LTH) has attracted attention because it can explain why over-parameterized models often show high generalization ability. It is known that when we use iterative magnitude pruning (IMP), which is an algorithm to find sparse networks with high generalization ability that...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2022-09 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Sakamoto, Keitaro Sato, Issei |
description | The lottery ticket hypothesis (LTH) has attracted attention because it can explain why over-parameterized models often show high generalization ability. It is known that when we use iterative magnitude pruning (IMP), which is an algorithm to find sparse networks with high generalization ability that can be trained from the initial weights independently, called winning tickets, the initial large learning rate does not work well in deep neural networks such as ResNet. However, since the initial large learning rate generally helps the optimizer to converge to flatter minima, we hypothesize that the winning tickets have relatively sharp minima, which is considered a disadvantage in terms of generalization ability. In this paper, we confirm this hypothesis and show that the PAC-Bayesian theory can provide an explicit understanding of the relationship between LTH and generalization behavior. On the basis of our experimental findings that flatness is useful for improving accuracy and robustness to label noise and that the distance from the initial weights is deeply involved in winning tickets, we offer the PAC-Bayes bound using a spike-and-slab distribution to analyze winning tickets. Finally, we revisit existing algorithms for finding winning tickets from a PAC-Bayesian perspective and provide new insights into these methods. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2665376645</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2665376645</sourcerecordid><originalsourceid>FETCH-proquest_journals_26653766453</originalsourceid><addsrcrecordid>eNqNjUELgjAYQEcQJOV_GHQWbHOzq0nhoYMH7zLkU2fmbN8M1q_PQz-g04PHg7chAeP8FJ0TxnYkRBziOGYyZULwgBTZpEb_0VNH78Y5sJ5WunmAo4WfjesBNdLWmictszy6KL8KNdGqB7OmJVicoXH6DQeybdWIEP64J8fbtcqLaLbmtQC6ejCLXWdYMykFT6VMBP-v-gJQMzwr</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2665376645</pqid></control><display><type>article</type><title>Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective</title><source>Free E- Journals</source><creator>Sakamoto, Keitaro ; Sato, Issei</creator><creatorcontrib>Sakamoto, Keitaro ; Sato, Issei</creatorcontrib><description>The lottery ticket hypothesis (LTH) has attracted attention because it can explain why over-parameterized models often show high generalization ability. It is known that when we use iterative magnitude pruning (IMP), which is an algorithm to find sparse networks with high generalization ability that can be trained from the initial weights independently, called winning tickets, the initial large learning rate does not work well in deep neural networks such as ResNet. However, since the initial large learning rate generally helps the optimizer to converge to flatter minima, we hypothesize that the winning tickets have relatively sharp minima, which is considered a disadvantage in terms of generalization ability. In this paper, we confirm this hypothesis and show that the PAC-Bayesian theory can provide an explicit understanding of the relationship between LTH and generalization behavior. On the basis of our experimental findings that flatness is useful for improving accuracy and robustness to label noise and that the distance from the initial weights is deeply involved in winning tickets, we offer the PAC-Bayes bound using a spike-and-slab distribution to analyze winning tickets. Finally, we revisit existing algorithms for finding winning tickets from a PAC-Bayesian perspective and provide new insights into these methods.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Artificial neural networks ; Bayesian analysis ; Hypotheses ; Iterative methods ; Machine learning ; Minima</subject><ispartof>arXiv.org, 2022-09</ispartof><rights>2022. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>777,781</link.rule.ids></links><search><creatorcontrib>Sakamoto, Keitaro</creatorcontrib><creatorcontrib>Sato, Issei</creatorcontrib><title>Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective</title><title>arXiv.org</title><description>The lottery ticket hypothesis (LTH) has attracted attention because it can explain why over-parameterized models often show high generalization ability. It is known that when we use iterative magnitude pruning (IMP), which is an algorithm to find sparse networks with high generalization ability that can be trained from the initial weights independently, called winning tickets, the initial large learning rate does not work well in deep neural networks such as ResNet. However, since the initial large learning rate generally helps the optimizer to converge to flatter minima, we hypothesize that the winning tickets have relatively sharp minima, which is considered a disadvantage in terms of generalization ability. In this paper, we confirm this hypothesis and show that the PAC-Bayesian theory can provide an explicit understanding of the relationship between LTH and generalization behavior. On the basis of our experimental findings that flatness is useful for improving accuracy and robustness to label noise and that the distance from the initial weights is deeply involved in winning tickets, we offer the PAC-Bayes bound using a spike-and-slab distribution to analyze winning tickets. Finally, we revisit existing algorithms for finding winning tickets from a PAC-Bayesian perspective and provide new insights into these methods.</description><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>Bayesian analysis</subject><subject>Hypotheses</subject><subject>Iterative methods</subject><subject>Machine learning</subject><subject>Minima</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjUELgjAYQEcQJOV_GHQWbHOzq0nhoYMH7zLkU2fmbN8M1q_PQz-g04PHg7chAeP8FJ0TxnYkRBziOGYyZULwgBTZpEb_0VNH78Y5sJ5WunmAo4WfjesBNdLWmictszy6KL8KNdGqB7OmJVicoXH6DQeybdWIEP64J8fbtcqLaLbmtQC6ejCLXWdYMykFT6VMBP-v-gJQMzwr</recordid><startdate>20220928</startdate><enddate>20220928</enddate><creator>Sakamoto, Keitaro</creator><creator>Sato, Issei</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20220928</creationdate><title>Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective</title><author>Sakamoto, Keitaro ; Sato, Issei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_26653766453</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>Bayesian analysis</topic><topic>Hypotheses</topic><topic>Iterative methods</topic><topic>Machine learning</topic><topic>Minima</topic><toplevel>online_resources</toplevel><creatorcontrib>Sakamoto, Keitaro</creatorcontrib><creatorcontrib>Sato, Issei</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sakamoto, Keitaro</au><au>Sato, Issei</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective</atitle><jtitle>arXiv.org</jtitle><date>2022-09-28</date><risdate>2022</risdate><eissn>2331-8422</eissn><abstract>The lottery ticket hypothesis (LTH) has attracted attention because it can explain why over-parameterized models often show high generalization ability. It is known that when we use iterative magnitude pruning (IMP), which is an algorithm to find sparse networks with high generalization ability that can be trained from the initial weights independently, called winning tickets, the initial large learning rate does not work well in deep neural networks such as ResNet. However, since the initial large learning rate generally helps the optimizer to converge to flatter minima, we hypothesize that the winning tickets have relatively sharp minima, which is considered a disadvantage in terms of generalization ability. In this paper, we confirm this hypothesis and show that the PAC-Bayesian theory can provide an explicit understanding of the relationship between LTH and generalization behavior. On the basis of our experimental findings that flatness is useful for improving accuracy and robustness to label noise and that the distance from the initial weights is deeply involved in winning tickets, we offer the PAC-Bayes bound using a spike-and-slab distribution to analyze winning tickets. Finally, we revisit existing algorithms for finding winning tickets from a PAC-Bayesian perspective and provide new insights into these methods.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2022-09 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2665376645 |
source | Free E- Journals |
subjects | Algorithms Artificial neural networks Bayesian analysis Hypotheses Iterative methods Machine learning Minima |
title | Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T01%3A05%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Analyzing%20Lottery%20Ticket%20Hypothesis%20from%20PAC-Bayesian%20Theory%20Perspective&rft.jtitle=arXiv.org&rft.au=Sakamoto,%20Keitaro&rft.date=2022-09-28&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2665376645%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2665376645&rft_id=info:pmid/&rfr_iscdi=true |