Overlapping Communication With Computation in Parameter Server for Scalable DL Training

Scalability of distributed deep learning (DL) training with parameter server (PS) architecture is often communication constrained in large clusters. There are recent efforts that use a layer by layer strategy to overlap gradient communication with backward computation so as to reduce the impact of c...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on parallel and distributed systems 2021-09, Vol.32 (9), p.2144-2159
Hauptverfasser: Wang, Shaoqi, Pi, Aidi, Zhou, Xiaobo, Wang, Jun, Xu, Cheng-Zhong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2159
container_issue 9
container_start_page 2144
container_title IEEE transactions on parallel and distributed systems
container_volume 32
creator Wang, Shaoqi
Pi, Aidi
Zhou, Xiaobo
Wang, Jun
Xu, Cheng-Zhong
description Scalability of distributed deep learning (DL) training with parameter server (PS) architecture is often communication constrained in large clusters. There are recent efforts that use a layer by layer strategy to overlap gradient communication with backward computation so as to reduce the impact of communication constraint on the scalability. However, the approaches could bring significant overhead in gradient communication. Meanwhile, they cannot be effectively applied to the overlap between parameter communication and forward computation. In this article, we propose and develop iPart, a novel approach that partitions communication and computation in various partition sizes to overlap gradient communication with backward computation and parameter communication with forward computation. iPart formulates the partitioning decision as an optimization problem and solves it based on a greedy algorithm to derive communication and computation partitions. We implement iPart in the open-source DL framework BigDL and perform evaluations with various DL workloads. Experimental results show that iPart improves the scalability of a cluster of 72 nodes by up to 94 percent over the default PS and 52 percent over the layer by layer strategy.
doi_str_mv 10.1109/TPDS.2021.3062721
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_9366342</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9366342</ieee_id><sourcerecordid>2503499814</sourcerecordid><originalsourceid>FETCH-LOGICAL-c293t-ec23847d2ab0e03c0dd0986edb8dddf4f2e3cf154df37d7d369cfaf57ed164d83</originalsourceid><addsrcrecordid>eNo9kEtLAzEUhYMoWKs_QNwMuJ6a58xkKfUJhRZa6TKkyY2mzMvMjOC_N8MUV_dwOOdc-BC6JXhBCJYPu83TdkExJQuGM5pTcoZmRIgipaRg51FjLlJJibxEV113xJhwgfkM7dc_EErdtr7-TJZNVQ21N7r3TZ3sff81Wu3QT4avk40OuoIeQrKFEJuJa6I0utSHEpKnVbIL2tdx6xpdOF12cHO6c_Tx8rxbvqWr9ev78nGVGipZn4KhrOC5pfqAATODrcWyyMAeCmut444CM44Ibh3LbW5ZJo3TTuRgScZtweboftptQ_M9QNerYzOEOr5UVGDGpSwIjykypUxoui6AU23wlQ6_imA18lMjPzXyUyd-sXM3dTwA_OclyzLGKfsDdq1tUw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2503499814</pqid></control><display><type>article</type><title>Overlapping Communication With Computation in Parameter Server for Scalable DL Training</title><source>IEEE Electronic Library (IEL)</source><creator>Wang, Shaoqi ; Pi, Aidi ; Zhou, Xiaobo ; Wang, Jun ; Xu, Cheng-Zhong</creator><creatorcontrib>Wang, Shaoqi ; Pi, Aidi ; Zhou, Xiaobo ; Wang, Jun ; Xu, Cheng-Zhong</creatorcontrib><description>Scalability of distributed deep learning (DL) training with parameter server (PS) architecture is often communication constrained in large clusters. There are recent efforts that use a layer by layer strategy to overlap gradient communication with backward computation so as to reduce the impact of communication constraint on the scalability. However, the approaches could bring significant overhead in gradient communication. Meanwhile, they cannot be effectively applied to the overlap between parameter communication and forward computation. In this article, we propose and develop iPart, a novel approach that partitions communication and computation in various partition sizes to overlap gradient communication with backward computation and parameter communication with forward computation. iPart formulates the partitioning decision as an optimization problem and solves it based on a greedy algorithm to derive communication and computation partitions. We implement iPart in the open-source DL framework BigDL and perform evaluations with various DL workloads. Experimental results show that iPart improves the scalability of a cluster of 72 nodes by up to 94 percent over the default PS and 52 percent over the layer by layer strategy.</description><identifier>ISSN: 1045-9219</identifier><identifier>EISSN: 1558-2183</identifier><identifier>DOI: 10.1109/TPDS.2021.3062721</identifier><identifier>CODEN: ITDSEO</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>backward computation ; Communication ; Computation ; Computational modeling ; Computer architecture ; Constraints ; forward computation ; gradient communication ; Greedy algorithms ; Machine learning ; Neural networks ; Optimization ; parameter communication ; Parameter server ; Parameters ; Partitions ; Scalability ; Servers ; Synchronization ; Training</subject><ispartof>IEEE transactions on parallel and distributed systems, 2021-09, Vol.32 (9), p.2144-2159</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c293t-ec23847d2ab0e03c0dd0986edb8dddf4f2e3cf154df37d7d369cfaf57ed164d83</citedby><cites>FETCH-LOGICAL-c293t-ec23847d2ab0e03c0dd0986edb8dddf4f2e3cf154df37d7d369cfaf57ed164d83</cites><orcidid>0000-0002-8187-779X ; 0000-0002-0466-7609 ; 0000-0001-9480-0356</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9366342$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9366342$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Wang, Shaoqi</creatorcontrib><creatorcontrib>Pi, Aidi</creatorcontrib><creatorcontrib>Zhou, Xiaobo</creatorcontrib><creatorcontrib>Wang, Jun</creatorcontrib><creatorcontrib>Xu, Cheng-Zhong</creatorcontrib><title>Overlapping Communication With Computation in Parameter Server for Scalable DL Training</title><title>IEEE transactions on parallel and distributed systems</title><addtitle>TPDS</addtitle><description>Scalability of distributed deep learning (DL) training with parameter server (PS) architecture is often communication constrained in large clusters. There are recent efforts that use a layer by layer strategy to overlap gradient communication with backward computation so as to reduce the impact of communication constraint on the scalability. However, the approaches could bring significant overhead in gradient communication. Meanwhile, they cannot be effectively applied to the overlap between parameter communication and forward computation. In this article, we propose and develop iPart, a novel approach that partitions communication and computation in various partition sizes to overlap gradient communication with backward computation and parameter communication with forward computation. iPart formulates the partitioning decision as an optimization problem and solves it based on a greedy algorithm to derive communication and computation partitions. We implement iPart in the open-source DL framework BigDL and perform evaluations with various DL workloads. Experimental results show that iPart improves the scalability of a cluster of 72 nodes by up to 94 percent over the default PS and 52 percent over the layer by layer strategy.</description><subject>backward computation</subject><subject>Communication</subject><subject>Computation</subject><subject>Computational modeling</subject><subject>Computer architecture</subject><subject>Constraints</subject><subject>forward computation</subject><subject>gradient communication</subject><subject>Greedy algorithms</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>parameter communication</subject><subject>Parameter server</subject><subject>Parameters</subject><subject>Partitions</subject><subject>Scalability</subject><subject>Servers</subject><subject>Synchronization</subject><subject>Training</subject><issn>1045-9219</issn><issn>1558-2183</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kEtLAzEUhYMoWKs_QNwMuJ6a58xkKfUJhRZa6TKkyY2mzMvMjOC_N8MUV_dwOOdc-BC6JXhBCJYPu83TdkExJQuGM5pTcoZmRIgipaRg51FjLlJJibxEV113xJhwgfkM7dc_EErdtr7-TJZNVQ21N7r3TZ3sff81Wu3QT4avk40OuoIeQrKFEJuJa6I0utSHEpKnVbIL2tdx6xpdOF12cHO6c_Tx8rxbvqWr9ev78nGVGipZn4KhrOC5pfqAATODrcWyyMAeCmut444CM44Ibh3LbW5ZJo3TTuRgScZtweboftptQ_M9QNerYzOEOr5UVGDGpSwIjykypUxoui6AU23wlQ6_imA18lMjPzXyUyd-sXM3dTwA_OclyzLGKfsDdq1tUw</recordid><startdate>20210901</startdate><enddate>20210901</enddate><creator>Wang, Shaoqi</creator><creator>Pi, Aidi</creator><creator>Zhou, Xiaobo</creator><creator>Wang, Jun</creator><creator>Xu, Cheng-Zhong</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-8187-779X</orcidid><orcidid>https://orcid.org/0000-0002-0466-7609</orcidid><orcidid>https://orcid.org/0000-0001-9480-0356</orcidid></search><sort><creationdate>20210901</creationdate><title>Overlapping Communication With Computation in Parameter Server for Scalable DL Training</title><author>Wang, Shaoqi ; Pi, Aidi ; Zhou, Xiaobo ; Wang, Jun ; Xu, Cheng-Zhong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c293t-ec23847d2ab0e03c0dd0986edb8dddf4f2e3cf154df37d7d369cfaf57ed164d83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>backward computation</topic><topic>Communication</topic><topic>Computation</topic><topic>Computational modeling</topic><topic>Computer architecture</topic><topic>Constraints</topic><topic>forward computation</topic><topic>gradient communication</topic><topic>Greedy algorithms</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>parameter communication</topic><topic>Parameter server</topic><topic>Parameters</topic><topic>Partitions</topic><topic>Scalability</topic><topic>Servers</topic><topic>Synchronization</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Shaoqi</creatorcontrib><creatorcontrib>Pi, Aidi</creatorcontrib><creatorcontrib>Zhou, Xiaobo</creatorcontrib><creatorcontrib>Wang, Jun</creatorcontrib><creatorcontrib>Xu, Cheng-Zhong</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on parallel and distributed systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wang, Shaoqi</au><au>Pi, Aidi</au><au>Zhou, Xiaobo</au><au>Wang, Jun</au><au>Xu, Cheng-Zhong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Overlapping Communication With Computation in Parameter Server for Scalable DL Training</atitle><jtitle>IEEE transactions on parallel and distributed systems</jtitle><stitle>TPDS</stitle><date>2021-09-01</date><risdate>2021</risdate><volume>32</volume><issue>9</issue><spage>2144</spage><epage>2159</epage><pages>2144-2159</pages><issn>1045-9219</issn><eissn>1558-2183</eissn><coden>ITDSEO</coden><abstract>Scalability of distributed deep learning (DL) training with parameter server (PS) architecture is often communication constrained in large clusters. There are recent efforts that use a layer by layer strategy to overlap gradient communication with backward computation so as to reduce the impact of communication constraint on the scalability. However, the approaches could bring significant overhead in gradient communication. Meanwhile, they cannot be effectively applied to the overlap between parameter communication and forward computation. In this article, we propose and develop iPart, a novel approach that partitions communication and computation in various partition sizes to overlap gradient communication with backward computation and parameter communication with forward computation. iPart formulates the partitioning decision as an optimization problem and solves it based on a greedy algorithm to derive communication and computation partitions. We implement iPart in the open-source DL framework BigDL and perform evaluations with various DL workloads. Experimental results show that iPart improves the scalability of a cluster of 72 nodes by up to 94 percent over the default PS and 52 percent over the layer by layer strategy.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TPDS.2021.3062721</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0002-8187-779X</orcidid><orcidid>https://orcid.org/0000-0002-0466-7609</orcidid><orcidid>https://orcid.org/0000-0001-9480-0356</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1045-9219
ispartof IEEE transactions on parallel and distributed systems, 2021-09, Vol.32 (9), p.2144-2159
issn 1045-9219
1558-2183
language eng
recordid cdi_ieee_primary_9366342
source IEEE Electronic Library (IEL)
subjects backward computation
Communication
Computation
Computational modeling
Computer architecture
Constraints
forward computation
gradient communication
Greedy algorithms
Machine learning
Neural networks
Optimization
parameter communication
Parameter server
Parameters
Partitions
Scalability
Servers
Synchronization
Training
title Overlapping Communication With Computation in Parameter Server for Scalable DL Training
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T09%3A23%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Overlapping%20Communication%20With%20Computation%20in%20Parameter%20Server%20for%20Scalable%20DL%20Training&rft.jtitle=IEEE%20transactions%20on%20parallel%20and%20distributed%20systems&rft.au=Wang,%20Shaoqi&rft.date=2021-09-01&rft.volume=32&rft.issue=9&rft.spage=2144&rft.epage=2159&rft.pages=2144-2159&rft.issn=1045-9219&rft.eissn=1558-2183&rft.coden=ITDSEO&rft_id=info:doi/10.1109/TPDS.2021.3062721&rft_dat=%3Cproquest_RIE%3E2503499814%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2503499814&rft_id=info:pmid/&rft_ieee_id=9366342&rfr_iscdi=true