Overlapping Communication With Computation in Parameter Server for Scalable DL Training

Scalability of distributed deep learning (DL) training with parameter server (PS) architecture is often communication constrained in large clusters. There are recent efforts that use a layer by layer strategy to overlap gradient communication with backward computation so as to reduce the impact of c...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on parallel and distributed systems 2021-09, Vol.32 (9), p.2144-2159
Hauptverfasser:	Wang, Shaoqi, Pi, Aidi, Zhou, Xiaobo, Wang, Jun, Xu, Cheng-Zhong
Format:	Artikel
Sprache:	eng
Schlagworte:	backward computation Communication Computation Computational modeling Computer architecture Constraints forward computation gradient communication Greedy algorithms Machine learning Neural networks Optimization parameter communication Parameter server Parameters Partitions Scalability Servers Synchronization Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	2159
container_issue	9
container_start_page	2144
container_title	IEEE transactions on parallel and distributed systems
container_volume	32
creator	Wang, Shaoqi Pi, Aidi Zhou, Xiaobo Wang, Jun Xu, Cheng-Zhong
description	Scalability of distributed deep learning (DL) training with parameter server (PS) architecture is often communication constrained in large clusters. There are recent efforts that use a layer by layer strategy to overlap gradient communication with backward computation so as to reduce the impact of communication constraint on the scalability. However, the approaches could bring significant overhead in gradient communication. Meanwhile, they cannot be effectively applied to the overlap between parameter communication and forward computation. In this article, we propose and develop iPart, a novel approach that partitions communication and computation in various partition sizes to overlap gradient communication with backward computation and parameter communication with forward computation. iPart formulates the partitioning decision as an optimization problem and solves it based on a greedy algorithm to derive communication and computation partitions. We implement iPart in the open-source DL framework BigDL and perform evaluations with various DL workloads. Experimental results show that iPart improves the scalability of a cluster of 72 nodes by up to 94 percent over the default PS and 52 percent over the layer by layer strategy.
doi_str_mv	10.1109/TPDS.2021.3062721
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_9366342</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9366342</ieee_id><sourcerecordid>2503499814</sourcerecordid><originalsourceid>FETCH-LOGICAL-c293t-ec23847d2ab0e03c0dd0986edb8dddf4f2e3cf154df37d7d369cfaf57ed164d83</originalsourceid><addsrcrecordid>eNo9kEtLAzEUhYMoWKs_QNwMuJ6a58xkKfUJhRZa6TKkyY2mzMvMjOC_N8MUV_dwOOdc-BC6JXhBCJYPu83TdkExJQuGM5pTcoZmRIgipaRg51FjLlJJibxEV113xJhwgfkM7dc_EErdtr7-TJZNVQ21N7r3TZ3sff81Wu3QT4avk40OuoIeQrKFEJuJa6I0utSHEpKnVbIL2tdx6xpdOF12cHO6c_Tx8rxbvqWr9ev78nGVGipZn4KhrOC5pfqAATODrcWyyMAeCmut444CM44Ibh3LbW5ZJo3TTuRgScZtweboftptQ_M9QNerYzOEOr5UVGDGpSwIjykypUxoui6AU23wlQ6_imA18lMjPzXyUyd-sXM3dTwA_OclyzLGKfsDdq1tUw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2503499814</pqid></control><display><type>article</type><title>Overlapping Communication With Computation in Parameter Server for Scalable DL Training</title><source>IEEE Electronic Library (IEL)</source><creator>Wang, Shaoqi ; Pi, Aidi ; Zhou, Xiaobo ; Wang, Jun ; Xu, Cheng-Zhong</creator><creatorcontrib>Wang, Shaoqi ; Pi, Aidi ; Zhou, Xiaobo ; Wang, Jun ; Xu, Cheng-Zhong</creatorcontrib><description>Scalability of distributed deep learning (DL) training with parameter server (PS) architecture is often communication constrained in large clusters. There are recent efforts that use a layer by layer strategy to overlap gradient communication with backward computation so as to reduce the impact of communication constraint on the scalability. However, the approaches could bring significant overhead in gradient communication. Meanwhile, they cannot be effectively applied to the overlap between parameter communication and forward computation. In this article, we propose and develop iPart, a novel approach that partitions communication and computation in various partition sizes to overlap gradient communication with backward computation and parameter communication with forward computation. iPart formulates the partitioning decision as an optimization problem and solves it based on a greedy algorithm to derive communication and computation partitions. We implement iPart in the open-source DL framework BigDL and perform evaluations with various DL workloads. Experimental results show that iPart improves the scalability of a cluster of 72 nodes by up to 94 percent over the default PS and 52 percent over the layer by layer strategy.</description><identifier>ISSN: 1045-9219</identifier><identifier>EISSN: 1558-2183</identifier><identifier>DOI: 10.1109/TPDS.2021.3062721</identifier><identifier>CODEN: ITDSEO</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>backward computation ; Communication ; Computation ; Computational modeling ; Computer architecture ; Constraints ; forward computation ; gradient communication ; Greedy algorithms ; Machine learning ; Neural networks ; Optimization ; parameter communication ; Parameter server ; Parameters ; Partitions ; Scalability ; Servers ; Synchronization ; Training</subject><ispartof>IEEE transactions on parallel and distributed systems, 2021-09, Vol.32 (9), p.2144-2159</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c293t-ec23847d2ab0e03c0dd0986edb8dddf4f2e3cf154df37d7d369cfaf57ed164d83</citedby><cites>FETCH-LOGICAL-c293t-ec23847d2ab0e03c0dd0986edb8dddf4f2e3cf154df37d7d369cfaf57ed164d83</cites><orcidid>0000-0002-8187-779X ; 0000-0002-0466-7609 ; 0000-0001-9480-0356</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9366342$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9366342$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Wang, Shaoqi</creatorcontrib><creatorcontrib>Pi, Aidi</creatorcontrib><creatorcontrib>Zhou, Xiaobo</creatorcontrib><creatorcontrib>Wang, Jun</creatorcontrib><creatorcontrib>Xu, Cheng-Zhong</creatorcontrib><title>Overlapping Communication With Computation in Parameter Server for Scalable DL Training</title><title>IEEE transactions on parallel and distributed systems</title><addtitle>TPDS</addtitle><description>Scalability of distributed deep learning (DL) training with parameter server (PS) architecture is often communication constrained in large clusters. There are recent efforts that use a layer by layer strategy to overlap gradient communication with backward computation so as to reduce the impact of communication constraint on the scalability. However, the approaches could bring significant overhead in gradient communication. Meanwhile, they cannot be effectively applied to the overlap between parameter communication and forward computation. In this article, we propose and develop iPart, a novel approach that partitions communication and computation in various partition sizes to overlap gradient communication with backward computation and parameter communication with forward computation. iPart formulates the partitioning decision as an optimization problem and solves it based on a greedy algorithm to derive communication and computation partitions. We implement iPart in the open-source DL framework BigDL and perform evaluations with various DL workloads. Experimental results show that iPart improves the scalability of a cluster of 72 nodes by up to 94 percent over the default PS and 52 percent over the layer by layer strategy.</description><subject>backward computation</subject><subject>Communication</subject><subject>Computation</subject><subject>Computational modeling</subject><subject>Computer architecture</subject><subject>Constraints</subject><subject>forward computation</subject><subject>gradient communication</subject><subject>Greedy algorithms</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>parameter communication</subject><subject>Parameter server</subject><subject>Parameters</subject><subject>Partitions</subject><subject>Scalability</subject><subject>Servers</subject><subject>Synchronization</subject><subject>Training</subject><issn>1045-9219</issn><issn>1558-2183</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kEtLAzEUhYMoWKs_QNwMuJ6a58xkKfUJhRZa6TKkyY2mzMvMjOC_N8MUV_dwOOdc-BC6JXhBCJYPu83TdkExJQuGM5pTcoZmRIgipaRg51FjLlJJibxEV113xJhwgfkM7dc_EErdtr7-TJZNVQ21N7r3TZ3sff81Wu3QT4avk40OuoIeQrKFEJuJa6I0utSHEpKnVbIL2tdx6xpdOF12cHO6c_Tx8rxbvqWr9ev78nGVGipZn4KhrOC5pfqAATODrcWyyMAeCmut444CM44Ibh3LbW5ZJo3TTuRgScZtweboftptQ_M9QNerYzOEOr5UVGDGpSwIjykypUxoui6AU23wlQ6_imA18lMjPzXyUyd-sXM3dTwA_OclyzLGKfsDdq1tUw</recordid><startdate>20210901</startdate><enddate>20210901</enddate><creator>Wang, Shaoqi</creator><creator>Pi, Aidi</creator><creator>Zhou, Xiaobo</creator><creator>Wang, Jun</creator><creator>Xu, Cheng-Zhong</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-8187-779X</orcidid><orcidid>https://orcid.org/0000-0002-0466-7609</orcidid><orcidid>https://orcid.org/0000-0001-9480-0356</orcidid></search><sort><creationdate>20210901</creationdate><title>Overlapping Communication With Computation in Parameter Server for Scalable DL Training</title><author>Wang, Shaoqi ; Pi, Aidi ; Zhou, Xiaobo ; Wang, Jun ; Xu, Cheng-Zhong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c293t-ec23847d2ab0e03c0dd0986edb8dddf4f2e3cf154df37d7d369cfaf57ed164d83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>backward computation</topic><topic>Communication</topic><topic>Computation</topic><topic>Computational modeling</topic><topic>Computer architecture</topic><topic>Constraints</topic><topic>forward computation</topic><topic>gradient communication</topic><topic>Greedy algorithms</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>parameter communication</topic><topic>Parameter server</topic><topic>Parameters</topic><topic>Partitions</topic><topic>Scalability</topic><topic>Servers</topic><topic>Synchronization</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Shaoqi</creatorcontrib><creatorcontrib>Pi, Aidi</creatorcontrib><creatorcontrib>Zhou, Xiaobo</creatorcontrib><creatorcontrib>Wang, Jun</creatorcontrib><creatorcontrib>Xu, Cheng-Zhong</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on parallel and distributed systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wang, Shaoqi</au><au>Pi, Aidi</au><au>Zhou, Xiaobo</au><au>Wang, Jun</au><au>Xu, Cheng-Zhong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Overlapping Communication With Computation in Parameter Server for Scalable DL Training</atitle><jtitle>IEEE transactions on parallel and distributed systems</jtitle><stitle>TPDS</stitle><date>2021-09-01</date><risdate>2021</risdate><volume>32</volume><issue>9</issue><spage>2144</spage><epage>2159</epage><pages>2144-2159</pages><issn>1045-9219</issn><eissn>1558-2183</eissn><coden>ITDSEO</coden><abstract>Scalability of distributed deep learning (DL) training with parameter server (PS) architecture is often communication constrained in large clusters. There are recent efforts that use a layer by layer strategy to overlap gradient communication with backward computation so as to reduce the impact of communication constraint on the scalability. However, the approaches could bring significant overhead in gradient communication. Meanwhile, they cannot be effectively applied to the overlap between parameter communication and forward computation. In this article, we propose and develop iPart, a novel approach that partitions communication and computation in various partition sizes to overlap gradient communication with backward computation and parameter communication with forward computation. iPart formulates the partitioning decision as an optimization problem and solves it based on a greedy algorithm to derive communication and computation partitions. We implement iPart in the open-source DL framework BigDL and perform evaluations with various DL workloads. Experimental results show that iPart improves the scalability of a cluster of 72 nodes by up to 94 percent over the default PS and 52 percent over the layer by layer strategy.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TPDS.2021.3062721</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0002-8187-779X</orcidid><orcidid>https://orcid.org/0000-0002-0466-7609</orcidid><orcidid>https://orcid.org/0000-0001-9480-0356</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1045-9219
ispartof	IEEE transactions on parallel and distributed systems, 2021-09, Vol.32 (9), p.2144-2159
issn	1045-9219 1558-2183
language	eng
recordid	cdi_ieee_primary_9366342
source	IEEE Electronic Library (IEL)
subjects	backward computation Communication Computation Computational modeling Computer architecture Constraints forward computation gradient communication Greedy algorithms Machine learning Neural networks Optimization parameter communication Parameter server Parameters Partitions Scalability Servers Synchronization Training
title	Overlapping Communication With Computation in Parameter Server for Scalable DL Training
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T09%3A23%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Overlapping%20Communication%20With%20Computation%20in%20Parameter%20Server%20for%20Scalable%20DL%20Training&rft.jtitle=IEEE%20transactions%20on%20parallel%20and%20distributed%20systems&rft.au=Wang,%20Shaoqi&rft.date=2021-09-01&rft.volume=32&rft.issue=9&rft.spage=2144&rft.epage=2159&rft.pages=2144-2159&rft.issn=1045-9219&rft.eissn=1558-2183&rft.coden=ITDSEO&rft_id=info:doi/10.1109/TPDS.2021.3062721&rft_dat=%3Cproquest_RIE%3E2503499814%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2503499814&rft_id=info:pmid/&rft_ieee_id=9366342&rfr_iscdi=true