Slow and Stale Gradients Can Win the Race

Distributed Stochastic Gradient Descent (SGD) when run in a synchronous manner, suffers from delays in runtime as it waits for the slowest workers (stragglers). Asynchronous methods can alleviate stragglers, but cause gradient staleness that can adversely affect the convergence error . In this work,...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE journal on selected areas in information theory 2021-09, Vol.2 (3), p.1012-1024
Hauptverfasser:	Dutta, Sanghamitra, Wang, Jianyu, Joshi, Gauri
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Asynchronous stochastic gradient descent Convergence Delays distributed machine learning Error analysis Optimization performance analysis Runtime Servers Staling stragglers Synchronization Tradeoffs Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1024
container_issue	3
container_start_page	1012
container_title	IEEE journal on selected areas in information theory
container_volume	2
creator	Dutta, Sanghamitra Wang, Jianyu Joshi, Gauri
description	Distributed Stochastic Gradient Descent (SGD) when run in a synchronous manner, suffers from delays in runtime as it waits for the slowest workers (stragglers). Asynchronous methods can alleviate stragglers, but cause gradient staleness that can adversely affect the convergence error . In this work, we present a novel theoretical characterization of the speedup offered by asynchronous methods by analyzing the trade-off between the error in the trained model and the actual training runtime (wallclock time). The main novelty in our work is that our runtime analysis considers random straggling delays, which helps us design and compare distributed SGD algorithms that strike a balance between straggling and staleness. We also provide a new error convergence analysis of asynchronous SGD variants without bounded or exponential delay assumptions. Finally, based on our theoretical characterization of the error-runtime trade-off, we propose a method of gradually varying synchronicity in distributed SGD and demonstrate its performance on the CIFAR10 dataset.
doi_str_mv	10.1109/JSAIT.2021.3103770
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_9509585</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9509585</ieee_id><sourcerecordid>2575128624</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2540-a636840cb3375f9bca2cf34f6ca6744e571c6c050c6072eb17e0a0a3f7ac364e3</originalsourceid><addsrcrecordid>eNpNkE1Lw0AURQdRsNT-Ad0MuHKR-uY7WZaitVIQbMXl8Dp9wZSa1JkU8d83tUVcvbu45z44jF0LGAoBxf3zfDRdDCVIMVQClHNwxnrSapHlXT7_ly_ZIKU1AEgptMtdj93NN803x3rF5y1uiE8iriqq28THWPP3qubtB_FXDHTFLkrcJBqcbp-9PT4sxk_Z7GUyHY9mWZBGQ4ZW2VxDWCrlTFksA8pQKl3agNZpTcaJYAMYCBacpKVwBAioSodBWU2qz26Pu9vYfO0otX7d7GLdvfTSOCNkbqXuWvLYCrFJKVLpt7H6xPjjBfiDFf9rxR-s-JOVDro5QhUR_QGFgcLkRu0BUCRauA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2575128624</pqid></control><display><type>article</type><title>Slow and Stale Gradients Can Win the Race</title><source>IEEE Electronic Library (IEL)</source><creator>Dutta, Sanghamitra ; Wang, Jianyu ; Joshi, Gauri</creator><creatorcontrib>Dutta, Sanghamitra ; Wang, Jianyu ; Joshi, Gauri</creatorcontrib><description>Distributed Stochastic Gradient Descent (SGD) when run in a synchronous manner, suffers from delays in runtime as it waits for the slowest workers (stragglers). Asynchronous methods can alleviate stragglers, but cause gradient staleness that can adversely affect the convergence error . In this work, we present a novel theoretical characterization of the speedup offered by asynchronous methods by analyzing the trade-off between the error in the trained model and the actual training runtime (wallclock time). The main novelty in our work is that our runtime analysis considers random straggling delays, which helps us design and compare distributed SGD algorithms that strike a balance between straggling and staleness. We also provide a new error convergence analysis of asynchronous SGD variants without bounded or exponential delay assumptions. Finally, based on our theoretical characterization of the error-runtime trade-off, we propose a method of gradually varying synchronicity in distributed SGD and demonstrate its performance on the CIFAR10 dataset.</description><identifier>ISSN: 2641-8770</identifier><identifier>EISSN: 2641-8770</identifier><identifier>DOI: 10.1109/JSAIT.2021.3103770</identifier><identifier>CODEN: IJSTL5</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Asynchronous stochastic gradient descent ; Convergence ; Delays ; distributed machine learning ; Error analysis ; Optimization ; performance analysis ; Runtime ; Servers ; Staling ; stragglers ; Synchronization ; Tradeoffs ; Training</subject><ispartof>IEEE journal on selected areas in information theory, 2021-09, Vol.2 (3), p.1012-1024</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c2540-a636840cb3375f9bca2cf34f6ca6744e571c6c050c6072eb17e0a0a3f7ac364e3</citedby><cites>FETCH-LOGICAL-c2540-a636840cb3375f9bca2cf34f6ca6744e571c6c050c6072eb17e0a0a3f7ac364e3</cites><orcidid>0000-0002-6372-9697 ; 0000-0002-6500-2627 ; 0000-0002-7075-9333</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9509585$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9509585$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Dutta, Sanghamitra</creatorcontrib><creatorcontrib>Wang, Jianyu</creatorcontrib><creatorcontrib>Joshi, Gauri</creatorcontrib><title>Slow and Stale Gradients Can Win the Race</title><title>IEEE journal on selected areas in information theory</title><addtitle>JSAIT</addtitle><description>Distributed Stochastic Gradient Descent (SGD) when run in a synchronous manner, suffers from delays in runtime as it waits for the slowest workers (stragglers). Asynchronous methods can alleviate stragglers, but cause gradient staleness that can adversely affect the convergence error . In this work, we present a novel theoretical characterization of the speedup offered by asynchronous methods by analyzing the trade-off between the error in the trained model and the actual training runtime (wallclock time). The main novelty in our work is that our runtime analysis considers random straggling delays, which helps us design and compare distributed SGD algorithms that strike a balance between straggling and staleness. We also provide a new error convergence analysis of asynchronous SGD variants without bounded or exponential delay assumptions. Finally, based on our theoretical characterization of the error-runtime trade-off, we propose a method of gradually varying synchronicity in distributed SGD and demonstrate its performance on the CIFAR10 dataset.</description><subject>Algorithms</subject><subject>Asynchronous stochastic gradient descent</subject><subject>Convergence</subject><subject>Delays</subject><subject>distributed machine learning</subject><subject>Error analysis</subject><subject>Optimization</subject><subject>performance analysis</subject><subject>Runtime</subject><subject>Servers</subject><subject>Staling</subject><subject>stragglers</subject><subject>Synchronization</subject><subject>Tradeoffs</subject><subject>Training</subject><issn>2641-8770</issn><issn>2641-8770</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkE1Lw0AURQdRsNT-Ad0MuHKR-uY7WZaitVIQbMXl8Dp9wZSa1JkU8d83tUVcvbu45z44jF0LGAoBxf3zfDRdDCVIMVQClHNwxnrSapHlXT7_ly_ZIKU1AEgptMtdj93NN803x3rF5y1uiE8iriqq28THWPP3qubtB_FXDHTFLkrcJBqcbp-9PT4sxk_Z7GUyHY9mWZBGQ4ZW2VxDWCrlTFksA8pQKl3agNZpTcaJYAMYCBacpKVwBAioSodBWU2qz26Pu9vYfO0otX7d7GLdvfTSOCNkbqXuWvLYCrFJKVLpt7H6xPjjBfiDFf9rxR-s-JOVDro5QhUR_QGFgcLkRu0BUCRauA</recordid><startdate>20210901</startdate><enddate>20210901</enddate><creator>Dutta, Sanghamitra</creator><creator>Wang, Jianyu</creator><creator>Joshi, Gauri</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-6372-9697</orcidid><orcidid>https://orcid.org/0000-0002-6500-2627</orcidid><orcidid>https://orcid.org/0000-0002-7075-9333</orcidid></search><sort><creationdate>20210901</creationdate><title>Slow and Stale Gradients Can Win the Race</title><author>Dutta, Sanghamitra ; Wang, Jianyu ; Joshi, Gauri</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2540-a636840cb3375f9bca2cf34f6ca6744e571c6c050c6072eb17e0a0a3f7ac364e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Asynchronous stochastic gradient descent</topic><topic>Convergence</topic><topic>Delays</topic><topic>distributed machine learning</topic><topic>Error analysis</topic><topic>Optimization</topic><topic>performance analysis</topic><topic>Runtime</topic><topic>Servers</topic><topic>Staling</topic><topic>stragglers</topic><topic>Synchronization</topic><topic>Tradeoffs</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dutta, Sanghamitra</creatorcontrib><creatorcontrib>Wang, Jianyu</creatorcontrib><creatorcontrib>Joshi, Gauri</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE journal on selected areas in information theory</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Dutta, Sanghamitra</au><au>Wang, Jianyu</au><au>Joshi, Gauri</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Slow and Stale Gradients Can Win the Race</atitle><jtitle>IEEE journal on selected areas in information theory</jtitle><stitle>JSAIT</stitle><date>2021-09-01</date><risdate>2021</risdate><volume>2</volume><issue>3</issue><spage>1012</spage><epage>1024</epage><pages>1012-1024</pages><issn>2641-8770</issn><eissn>2641-8770</eissn><coden>IJSTL5</coden><abstract>Distributed Stochastic Gradient Descent (SGD) when run in a synchronous manner, suffers from delays in runtime as it waits for the slowest workers (stragglers). Asynchronous methods can alleviate stragglers, but cause gradient staleness that can adversely affect the convergence error . In this work, we present a novel theoretical characterization of the speedup offered by asynchronous methods by analyzing the trade-off between the error in the trained model and the actual training runtime (wallclock time). The main novelty in our work is that our runtime analysis considers random straggling delays, which helps us design and compare distributed SGD algorithms that strike a balance between straggling and staleness. We also provide a new error convergence analysis of asynchronous SGD variants without bounded or exponential delay assumptions. Finally, based on our theoretical characterization of the error-runtime trade-off, we propose a method of gradually varying synchronicity in distributed SGD and demonstrate its performance on the CIFAR10 dataset.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/JSAIT.2021.3103770</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-6372-9697</orcidid><orcidid>https://orcid.org/0000-0002-6500-2627</orcidid><orcidid>https://orcid.org/0000-0002-7075-9333</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 2641-8770
ispartof	IEEE journal on selected areas in information theory, 2021-09, Vol.2 (3), p.1012-1024
issn	2641-8770 2641-8770
language	eng
recordid	cdi_ieee_primary_9509585
source	IEEE Electronic Library (IEL)
subjects	Algorithms Asynchronous stochastic gradient descent Convergence Delays distributed machine learning Error analysis Optimization performance analysis Runtime Servers Staling stragglers Synchronization Tradeoffs Training
title	Slow and Stale Gradients Can Win the Race
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T07%3A08%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Slow%20and%20Stale%20Gradients%20Can%20Win%20the%20Race&rft.jtitle=IEEE%20journal%20on%20selected%20areas%20in%20information%20theory&rft.au=Dutta,%20Sanghamitra&rft.date=2021-09-01&rft.volume=2&rft.issue=3&rft.spage=1012&rft.epage=1024&rft.pages=1012-1024&rft.issn=2641-8770&rft.eissn=2641-8770&rft.coden=IJSTL5&rft_id=info:doi/10.1109/JSAIT.2021.3103770&rft_dat=%3Cproquest_RIE%3E2575128624%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2575128624&rft_id=info:pmid/&rft_ieee_id=9509585&rfr_iscdi=true