Randomized Sharpness-Aware Training for Boosting Computational Efficiency in Deep Learning

By driving models to converge to flat minima, sharpness-aware learning algorithms (such as SAM) have shown the power to achieve state-of-the-art performances. However, these algorithms will generally incur one extra forward-backward propagation at each training iteration, which largely burdens the c...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Zhao, Yang, Zhang, Hao, Hu, Xiuyuan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Zhao, Yang
Zhang, Hao
Hu, Xiuyuan
description By driving models to converge to flat minima, sharpness-aware learning algorithms (such as SAM) have shown the power to achieve state-of-the-art performances. However, these algorithms will generally incur one extra forward-backward propagation at each training iteration, which largely burdens the computation especially for scalable models. To this end, we propose a simple yet efficient training scheme, called Randomized Sharpness-Aware Training (RST). Optimizers in RST would perform a Bernoulli trial at each iteration to choose randomly from base algorithms (SGD) and sharpness-aware algorithms (SAM) with a probability arranged by a predefined scheduling function. Due to the mixture of base algorithms, the overall count of propagation pairs could be largely reduced. Also, we give theoretical analysis on the convergence of RST. Then, we empirically study the computation cost and effect of various types of scheduling functions, and give directions on setting appropriate scheduling functions. Further, we extend the RST to a general framework (G-RST), where we can adjust regularization degree on sharpness freely for any scheduling function. We show that G-RST can outperform SAM in most cases while saving 50\% extra computation cost.
doi_str_mv 10.48550/arxiv.2203.09962
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2203_09962</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2203_09962</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-b033e06b6a138bf2ce3e16eb038e8f5addc75d5bab4a1dcf58ec62380040e4163</originalsourceid><addsrcrecordid>eNotj8tOwzAURL1hgQofwAr_QIJjx667LKE8pEhIkBWb6Nq-BkuNHTnhUb4eUliNZqQz0iHkomJlraVkV5C_wkfJORMl22wUPyUvTxBdGsI3Ovr8BnmMOE3F9hMy0i5DiCG-Up8yvU5pmpfSpGF8n2EOKcKe7rwPNmC0BxoivUEcaYuQF-yMnHjYT3j-nyvS3e665r5oH-8emm1bgFrzwjAhkCmjoBLaeG5RYKXwd9aovQTn7Fo6acDUUDnrpUaruNCM1QzrSokVufy7Pcr1Yw4D5EO_SPZHSfEDMzlOKg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Randomized Sharpness-Aware Training for Boosting Computational Efficiency in Deep Learning</title><source>arXiv.org</source><creator>Zhao, Yang ; Zhang, Hao ; Hu, Xiuyuan</creator><creatorcontrib>Zhao, Yang ; Zhang, Hao ; Hu, Xiuyuan</creatorcontrib><description>By driving models to converge to flat minima, sharpness-aware learning algorithms (such as SAM) have shown the power to achieve state-of-the-art performances. However, these algorithms will generally incur one extra forward-backward propagation at each training iteration, which largely burdens the computation especially for scalable models. To this end, we propose a simple yet efficient training scheme, called Randomized Sharpness-Aware Training (RST). Optimizers in RST would perform a Bernoulli trial at each iteration to choose randomly from base algorithms (SGD) and sharpness-aware algorithms (SAM) with a probability arranged by a predefined scheduling function. Due to the mixture of base algorithms, the overall count of propagation pairs could be largely reduced. Also, we give theoretical analysis on the convergence of RST. Then, we empirically study the computation cost and effect of various types of scheduling functions, and give directions on setting appropriate scheduling functions. Further, we extend the RST to a general framework (G-RST), where we can adjust regularization degree on sharpness freely for any scheduling function. We show that G-RST can outperform SAM in most cases while saving 50\% extra computation cost.</description><identifier>DOI: 10.48550/arxiv.2203.09962</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning</subject><creationdate>2022-03</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2203.09962$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2203.09962$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhao, Yang</creatorcontrib><creatorcontrib>Zhang, Hao</creatorcontrib><creatorcontrib>Hu, Xiuyuan</creatorcontrib><title>Randomized Sharpness-Aware Training for Boosting Computational Efficiency in Deep Learning</title><description>By driving models to converge to flat minima, sharpness-aware learning algorithms (such as SAM) have shown the power to achieve state-of-the-art performances. However, these algorithms will generally incur one extra forward-backward propagation at each training iteration, which largely burdens the computation especially for scalable models. To this end, we propose a simple yet efficient training scheme, called Randomized Sharpness-Aware Training (RST). Optimizers in RST would perform a Bernoulli trial at each iteration to choose randomly from base algorithms (SGD) and sharpness-aware algorithms (SAM) with a probability arranged by a predefined scheduling function. Due to the mixture of base algorithms, the overall count of propagation pairs could be largely reduced. Also, we give theoretical analysis on the convergence of RST. Then, we empirically study the computation cost and effect of various types of scheduling functions, and give directions on setting appropriate scheduling functions. Further, we extend the RST to a general framework (G-RST), where we can adjust regularization degree on sharpness freely for any scheduling function. We show that G-RST can outperform SAM in most cases while saving 50\% extra computation cost.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tOwzAURL1hgQofwAr_QIJjx667LKE8pEhIkBWb6Nq-BkuNHTnhUb4eUliNZqQz0iHkomJlraVkV5C_wkfJORMl22wUPyUvTxBdGsI3Ovr8BnmMOE3F9hMy0i5DiCG-Up8yvU5pmpfSpGF8n2EOKcKe7rwPNmC0BxoivUEcaYuQF-yMnHjYT3j-nyvS3e665r5oH-8emm1bgFrzwjAhkCmjoBLaeG5RYKXwd9aovQTn7Fo6acDUUDnrpUaruNCM1QzrSokVufy7Pcr1Yw4D5EO_SPZHSfEDMzlOKg</recordid><startdate>20220318</startdate><enddate>20220318</enddate><creator>Zhao, Yang</creator><creator>Zhang, Hao</creator><creator>Hu, Xiuyuan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20220318</creationdate><title>Randomized Sharpness-Aware Training for Boosting Computational Efficiency in Deep Learning</title><author>Zhao, Yang ; Zhang, Hao ; Hu, Xiuyuan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-b033e06b6a138bf2ce3e16eb038e8f5addc75d5bab4a1dcf58ec62380040e4163</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhao, Yang</creatorcontrib><creatorcontrib>Zhang, Hao</creatorcontrib><creatorcontrib>Hu, Xiuyuan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhao, Yang</au><au>Zhang, Hao</au><au>Hu, Xiuyuan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Randomized Sharpness-Aware Training for Boosting Computational Efficiency in Deep Learning</atitle><date>2022-03-18</date><risdate>2022</risdate><abstract>By driving models to converge to flat minima, sharpness-aware learning algorithms (such as SAM) have shown the power to achieve state-of-the-art performances. However, these algorithms will generally incur one extra forward-backward propagation at each training iteration, which largely burdens the computation especially for scalable models. To this end, we propose a simple yet efficient training scheme, called Randomized Sharpness-Aware Training (RST). Optimizers in RST would perform a Bernoulli trial at each iteration to choose randomly from base algorithms (SGD) and sharpness-aware algorithms (SAM) with a probability arranged by a predefined scheduling function. Due to the mixture of base algorithms, the overall count of propagation pairs could be largely reduced. Also, we give theoretical analysis on the convergence of RST. Then, we empirically study the computation cost and effect of various types of scheduling functions, and give directions on setting appropriate scheduling functions. Further, we extend the RST to a general framework (G-RST), where we can adjust regularization degree on sharpness freely for any scheduling function. We show that G-RST can outperform SAM in most cases while saving 50\% extra computation cost.</abstract><doi>10.48550/arxiv.2203.09962</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2203.09962
ispartof
issn
language eng
recordid cdi_arxiv_primary_2203_09962
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Learning
title Randomized Sharpness-Aware Training for Boosting Computational Efficiency in Deep Learning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T07%3A06%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Randomized%20Sharpness-Aware%20Training%20for%20Boosting%20Computational%20Efficiency%20in%20Deep%20Learning&rft.au=Zhao,%20Yang&rft.date=2022-03-18&rft_id=info:doi/10.48550/arxiv.2203.09962&rft_dat=%3Carxiv_GOX%3E2203_09962%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true