Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

Overparametrized Deep Neural Networks (DNNs) often achieve astounding performances, but may potentially result in severe generalization error. Recently, the relation between the sharpness of the loss landscape and the generalization error has been established by Foret et al. (2020), in which the Sha...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2022-05
Hauptverfasser:	Du, Jiawei, Hanshu Yan, Feng, Jiashi, Zhou, Joey Tianyi, Liangli Zhen, Rick Siow Mong Goh, Tan, Vincent Y F
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Computing costs Iterative methods Neural networks Perturbation Sharpness Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Du, Jiawei Hanshu Yan Feng, Jiashi Zhou, Joey Tianyi Liangli Zhen Rick Siow Mong Goh Tan, Vincent Y F
description	Overparametrized Deep Neural Networks (DNNs) often achieve astounding performances, but may potentially result in severe generalization error. Recently, the relation between the sharpness of the loss landscape and the generalization error has been established by Foret et al. (2020), in which the Sharpness Aware Minimizer (SAM) was proposed to mitigate the degradation of the generalization. Unfortunately, SAM s computational cost is roughly double that of base optimizers, such as Stochastic Gradient Descent (SGD). This paper thus proposes Efficient Sharpness Aware Minimizer (ESAM), which boosts SAM s efficiency at no cost to its generalization performance. ESAM includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection. In the former, the sharpness measure is approximated by perturbing a stochastically chosen set of weights in each iteration; in the latter, the SAM loss is optimized using only a judiciously selected subset of data that is sensitive to the sharpness. We provide theoretical explanations as to why these strategies perform well. We also show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis base optimizers, while test accuracies are preserved or even improved.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2580191172</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2580191172</sourcerecordid><originalsourceid>FETCH-proquest_journals_25801911723</originalsourceid><addsrcrecordid>eNqNi70KwjAYAIMgWLTvEHAu5MfYOktFB3WwewmaaGqb1C-pBZ_eDD6A0w13N0EJ45xmxYqxGUq9bwghbJ0zIXiCzqXW5mqUDfjykNBb5X0mRwkKH401nfnIYJzF2gE-dD24t7rhCmR09o6dxic1gGwjwujg6RdoqmXrVfrjHC13ZbXdZ3F9DcqHunED2KhqJgpCN5TmjP9XfQFUrT73</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2580191172</pqid></control><display><type>article</type><title>Efficient Sharpness-aware Minimization for Improved Training of Neural Networks</title><source>Free E- Journals</source><creator>Du, Jiawei ; Hanshu Yan ; Feng, Jiashi ; Zhou, Joey Tianyi ; Liangli Zhen ; Rick Siow Mong Goh ; Tan, Vincent Y F</creator><creatorcontrib>Du, Jiawei ; Hanshu Yan ; Feng, Jiashi ; Zhou, Joey Tianyi ; Liangli Zhen ; Rick Siow Mong Goh ; Tan, Vincent Y F</creatorcontrib><description>Overparametrized Deep Neural Networks (DNNs) often achieve astounding performances, but may potentially result in severe generalization error. Recently, the relation between the sharpness of the loss landscape and the generalization error has been established by Foret et al. (2020), in which the Sharpness Aware Minimizer (SAM) was proposed to mitigate the degradation of the generalization. Unfortunately, SAM s computational cost is roughly double that of base optimizers, such as Stochastic Gradient Descent (SGD). This paper thus proposes Efficient Sharpness Aware Minimizer (ESAM), which boosts SAM s efficiency at no cost to its generalization performance. ESAM includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection. In the former, the sharpness measure is approximated by perturbing a stochastically chosen set of weights in each iteration; in the latter, the SAM loss is optimized using only a judiciously selected subset of data that is sensitive to the sharpness. We provide theoretical explanations as to why these strategies perform well. We also show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis base optimizers, while test accuracies are preserved or even improved.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Artificial neural networks ; Computing costs ; Iterative methods ; Neural networks ; Perturbation ; Sharpness ; Training</subject><ispartof>arXiv.org, 2022-05</ispartof><rights>2022. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Du, Jiawei</creatorcontrib><creatorcontrib>Hanshu Yan</creatorcontrib><creatorcontrib>Feng, Jiashi</creatorcontrib><creatorcontrib>Zhou, Joey Tianyi</creatorcontrib><creatorcontrib>Liangli Zhen</creatorcontrib><creatorcontrib>Rick Siow Mong Goh</creatorcontrib><creatorcontrib>Tan, Vincent Y F</creatorcontrib><title>Efficient Sharpness-aware Minimization for Improved Training of Neural Networks</title><title>arXiv.org</title><description>Overparametrized Deep Neural Networks (DNNs) often achieve astounding performances, but may potentially result in severe generalization error. Recently, the relation between the sharpness of the loss landscape and the generalization error has been established by Foret et al. (2020), in which the Sharpness Aware Minimizer (SAM) was proposed to mitigate the degradation of the generalization. Unfortunately, SAM s computational cost is roughly double that of base optimizers, such as Stochastic Gradient Descent (SGD). This paper thus proposes Efficient Sharpness Aware Minimizer (ESAM), which boosts SAM s efficiency at no cost to its generalization performance. ESAM includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection. In the former, the sharpness measure is approximated by perturbing a stochastically chosen set of weights in each iteration; in the latter, the SAM loss is optimized using only a judiciously selected subset of data that is sensitive to the sharpness. We provide theoretical explanations as to why these strategies perform well. We also show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis base optimizers, while test accuracies are preserved or even improved.</description><subject>Artificial neural networks</subject><subject>Computing costs</subject><subject>Iterative methods</subject><subject>Neural networks</subject><subject>Perturbation</subject><subject>Sharpness</subject><subject>Training</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNi70KwjAYAIMgWLTvEHAu5MfYOktFB3WwewmaaGqb1C-pBZ_eDD6A0w13N0EJ45xmxYqxGUq9bwghbJ0zIXiCzqXW5mqUDfjykNBb5X0mRwkKH401nfnIYJzF2gE-dD24t7rhCmR09o6dxic1gGwjwujg6RdoqmXrVfrjHC13ZbXdZ3F9DcqHunED2KhqJgpCN5TmjP9XfQFUrT73</recordid><startdate>20220528</startdate><enddate>20220528</enddate><creator>Du, Jiawei</creator><creator>Hanshu Yan</creator><creator>Feng, Jiashi</creator><creator>Zhou, Joey Tianyi</creator><creator>Liangli Zhen</creator><creator>Rick Siow Mong Goh</creator><creator>Tan, Vincent Y F</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20220528</creationdate><title>Efficient Sharpness-aware Minimization for Improved Training of Neural Networks</title><author>Du, Jiawei ; Hanshu Yan ; Feng, Jiashi ; Zhou, Joey Tianyi ; Liangli Zhen ; Rick Siow Mong Goh ; Tan, Vincent Y F</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_25801911723</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Artificial neural networks</topic><topic>Computing costs</topic><topic>Iterative methods</topic><topic>Neural networks</topic><topic>Perturbation</topic><topic>Sharpness</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Du, Jiawei</creatorcontrib><creatorcontrib>Hanshu Yan</creatorcontrib><creatorcontrib>Feng, Jiashi</creatorcontrib><creatorcontrib>Zhou, Joey Tianyi</creatorcontrib><creatorcontrib>Liangli Zhen</creatorcontrib><creatorcontrib>Rick Siow Mong Goh</creatorcontrib><creatorcontrib>Tan, Vincent Y F</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Du, Jiawei</au><au>Hanshu Yan</au><au>Feng, Jiashi</au><au>Zhou, Joey Tianyi</au><au>Liangli Zhen</au><au>Rick Siow Mong Goh</au><au>Tan, Vincent Y F</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Efficient Sharpness-aware Minimization for Improved Training of Neural Networks</atitle><jtitle>arXiv.org</jtitle><date>2022-05-28</date><risdate>2022</risdate><eissn>2331-8422</eissn><abstract>Overparametrized Deep Neural Networks (DNNs) often achieve astounding performances, but may potentially result in severe generalization error. Recently, the relation between the sharpness of the loss landscape and the generalization error has been established by Foret et al. (2020), in which the Sharpness Aware Minimizer (SAM) was proposed to mitigate the degradation of the generalization. Unfortunately, SAM s computational cost is roughly double that of base optimizers, such as Stochastic Gradient Descent (SGD). This paper thus proposes Efficient Sharpness Aware Minimizer (ESAM), which boosts SAM s efficiency at no cost to its generalization performance. ESAM includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection. In the former, the sharpness measure is approximated by perturbing a stochastically chosen set of weights in each iteration; in the latter, the SAM loss is optimized using only a judiciously selected subset of data that is sensitive to the sharpness. We provide theoretical explanations as to why these strategies perform well. We also show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis base optimizers, while test accuracies are preserved or even improved.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2022-05
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2580191172
source	Free E- Journals
subjects	Artificial neural networks Computing costs Iterative methods Neural networks Perturbation Sharpness Training
title	Efficient Sharpness-aware Minimization for Improved Training of Neural Networks
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T13%3A49%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Efficient%20Sharpness-aware%20Minimization%20for%20Improved%20Training%20of%20Neural%20Networks&rft.jtitle=arXiv.org&rft.au=Du,%20Jiawei&rft.date=2022-05-28&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2580191172%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2580191172&rft_id=info:pmid/&rfr_iscdi=true