Randomized Sharpness-Aware Training for Boosting Computational Efficiency in Deep Learning

By driving models to converge to flat minima, sharpness-aware learning algorithms (such as SAM) have shown the power to achieve state-of-the-art performances. However, these algorithms will generally incur one extra forward-backward propagation at each training iteration, which largely burdens the c...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zhao, Yang, Zhang, Hao, Hu, Xiuyuan
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Zhao, Yang Zhang, Hao Hu, Xiuyuan
description	By driving models to converge to flat minima, sharpness-aware learning algorithms (such as SAM) have shown the power to achieve state-of-the-art performances. However, these algorithms will generally incur one extra forward-backward propagation at each training iteration, which largely burdens the computation especially for scalable models. To this end, we propose a simple yet efficient training scheme, called Randomized Sharpness-Aware Training (RST). Optimizers in RST would perform a Bernoulli trial at each iteration to choose randomly from base algorithms (SGD) and sharpness-aware algorithms (SAM) with a probability arranged by a predefined scheduling function. Due to the mixture of base algorithms, the overall count of propagation pairs could be largely reduced. Also, we give theoretical analysis on the convergence of RST. Then, we empirically study the computation cost and effect of various types of scheduling functions, and give directions on setting appropriate scheduling functions. Further, we extend the RST to a general framework (G-RST), where we can adjust regularization degree on sharpness freely for any scheduling function. We show that G-RST can outperform SAM in most cases while saving 50\% extra computation cost.
doi_str_mv	10.48550/arxiv.2203.09962
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2203_09962</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2203_09962</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-b033e06b6a138bf2ce3e16eb038e8f5addc75d5bab4a1dcf58ec62380040e4163</originalsourceid><addsrcrecordid>eNotj8tOwzAURL1hgQofwAr_QIJjx667LKE8pEhIkBWb6Nq-BkuNHTnhUb4eUliNZqQz0iHkomJlraVkV5C_wkfJORMl22wUPyUvTxBdGsI3Ovr8BnmMOE3F9hMy0i5DiCG-Up8yvU5pmpfSpGF8n2EOKcKe7rwPNmC0BxoivUEcaYuQF-yMnHjYT3j-nyvS3e665r5oH-8emm1bgFrzwjAhkCmjoBLaeG5RYKXwd9aovQTn7Fo6acDUUDnrpUaruNCM1QzrSokVufy7Pcr1Yw4D5EO_SPZHSfEDMzlOKg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Randomized Sharpness-Aware Training for Boosting Computational Efficiency in Deep Learning</title><source>arXiv.org</source><creator>Zhao, Yang ; Zhang, Hao ; Hu, Xiuyuan</creator><creatorcontrib>Zhao, Yang ; Zhang, Hao ; Hu, Xiuyuan</creatorcontrib><description>By driving models to converge to flat minima, sharpness-aware learning algorithms (such as SAM) have shown the power to achieve state-of-the-art performances. However, these algorithms will generally incur one extra forward-backward propagation at each training iteration, which largely burdens the computation especially for scalable models. To this end, we propose a simple yet efficient training scheme, called Randomized Sharpness-Aware Training (RST). Optimizers in RST would perform a Bernoulli trial at each iteration to choose randomly from base algorithms (SGD) and sharpness-aware algorithms (SAM) with a probability arranged by a predefined scheduling function. Due to the mixture of base algorithms, the overall count of propagation pairs could be largely reduced. Also, we give theoretical analysis on the convergence of RST. Then, we empirically study the computation cost and effect of various types of scheduling functions, and give directions on setting appropriate scheduling functions. Further, we extend the RST to a general framework (G-RST), where we can adjust regularization degree on sharpness freely for any scheduling function. We show that G-RST can outperform SAM in most cases while saving 50\% extra computation cost.</description><identifier>DOI: 10.48550/arxiv.2203.09962</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning</subject><creationdate>2022-03</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2203.09962$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2203.09962$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhao, Yang</creatorcontrib><creatorcontrib>Zhang, Hao</creatorcontrib><creatorcontrib>Hu, Xiuyuan</creatorcontrib><title>Randomized Sharpness-Aware Training for Boosting Computational Efficiency in Deep Learning</title><description>By driving models to converge to flat minima, sharpness-aware learning algorithms (such as SAM) have shown the power to achieve state-of-the-art performances. However, these algorithms will generally incur one extra forward-backward propagation at each training iteration, which largely burdens the computation especially for scalable models. To this end, we propose a simple yet efficient training scheme, called Randomized Sharpness-Aware Training (RST). Optimizers in RST would perform a Bernoulli trial at each iteration to choose randomly from base algorithms (SGD) and sharpness-aware algorithms (SAM) with a probability arranged by a predefined scheduling function. Due to the mixture of base algorithms, the overall count of propagation pairs could be largely reduced. Also, we give theoretical analysis on the convergence of RST. Then, we empirically study the computation cost and effect of various types of scheduling functions, and give directions on setting appropriate scheduling functions. Further, we extend the RST to a general framework (G-RST), where we can adjust regularization degree on sharpness freely for any scheduling function. We show that G-RST can outperform SAM in most cases while saving 50\% extra computation cost.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tOwzAURL1hgQofwAr_QIJjx667LKE8pEhIkBWb6Nq-BkuNHTnhUb4eUliNZqQz0iHkomJlraVkV5C_wkfJORMl22wUPyUvTxBdGsI3Ovr8BnmMOE3F9hMy0i5DiCG-Up8yvU5pmpfSpGF8n2EOKcKe7rwPNmC0BxoivUEcaYuQF-yMnHjYT3j-nyvS3e665r5oH-8emm1bgFrzwjAhkCmjoBLaeG5RYKXwd9aovQTn7Fo6acDUUDnrpUaruNCM1QzrSokVufy7Pcr1Yw4D5EO_SPZHSfEDMzlOKg</recordid><startdate>20220318</startdate><enddate>20220318</enddate><creator>Zhao, Yang</creator><creator>Zhang, Hao</creator><creator>Hu, Xiuyuan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20220318</creationdate><title>Randomized Sharpness-Aware Training for Boosting Computational Efficiency in Deep Learning</title><author>Zhao, Yang ; Zhang, Hao ; Hu, Xiuyuan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-b033e06b6a138bf2ce3e16eb038e8f5addc75d5bab4a1dcf58ec62380040e4163</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhao, Yang</creatorcontrib><creatorcontrib>Zhang, Hao</creatorcontrib><creatorcontrib>Hu, Xiuyuan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhao, Yang</au><au>Zhang, Hao</au><au>Hu, Xiuyuan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Randomized Sharpness-Aware Training for Boosting Computational Efficiency in Deep Learning</atitle><date>2022-03-18</date><risdate>2022</risdate><abstract>By driving models to converge to flat minima, sharpness-aware learning algorithms (such as SAM) have shown the power to achieve state-of-the-art performances. However, these algorithms will generally incur one extra forward-backward propagation at each training iteration, which largely burdens the computation especially for scalable models. To this end, we propose a simple yet efficient training scheme, called Randomized Sharpness-Aware Training (RST). Optimizers in RST would perform a Bernoulli trial at each iteration to choose randomly from base algorithms (SGD) and sharpness-aware algorithms (SAM) with a probability arranged by a predefined scheduling function. Due to the mixture of base algorithms, the overall count of propagation pairs could be largely reduced. Also, we give theoretical analysis on the convergence of RST. Then, we empirically study the computation cost and effect of various types of scheduling functions, and give directions on setting appropriate scheduling functions. Further, we extend the RST to a general framework (G-RST), where we can adjust regularization degree on sharpness freely for any scheduling function. We show that G-RST can outperform SAM in most cases while saving 50\% extra computation cost.</abstract><doi>10.48550/arxiv.2203.09962</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2203.09962
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2203_09962
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Learning
title	Randomized Sharpness-Aware Training for Boosting Computational Efficiency in Deep Learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T07%3A06%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Randomized%20Sharpness-Aware%20Training%20for%20Boosting%20Computational%20Efficiency%20in%20Deep%20Learning&rft.au=Zhao,%20Yang&rft.date=2022-03-18&rft_id=info:doi/10.48550/arxiv.2203.09962&rft_dat=%3Carxiv_GOX%3E2203_09962%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true