Overlapping Communication With Computation in Parameter Server for Scalable DL Training
Scalability of distributed deep learning (DL) training with parameter server (PS) architecture is often communication constrained in large clusters. There are recent efforts that use a layer by layer strategy to overlap gradient communication with backward computation so as to reduce the impact of c...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on parallel and distributed systems 2021-09, Vol.32 (9), p.2144-2159 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 2159 |
---|---|
container_issue | 9 |
container_start_page | 2144 |
container_title | IEEE transactions on parallel and distributed systems |
container_volume | 32 |
creator | Wang, Shaoqi Pi, Aidi Zhou, Xiaobo Wang, Jun Xu, Cheng-Zhong |
description | Scalability of distributed deep learning (DL) training with parameter server (PS) architecture is often communication constrained in large clusters. There are recent efforts that use a layer by layer strategy to overlap gradient communication with backward computation so as to reduce the impact of communication constraint on the scalability. However, the approaches could bring significant overhead in gradient communication. Meanwhile, they cannot be effectively applied to the overlap between parameter communication and forward computation. In this article, we propose and develop iPart, a novel approach that partitions communication and computation in various partition sizes to overlap gradient communication with backward computation and parameter communication with forward computation. iPart formulates the partitioning decision as an optimization problem and solves it based on a greedy algorithm to derive communication and computation partitions. We implement iPart in the open-source DL framework BigDL and perform evaluations with various DL workloads. Experimental results show that iPart improves the scalability of a cluster of 72 nodes by up to 94 percent over the default PS and 52 percent over the layer by layer strategy. |
doi_str_mv | 10.1109/TPDS.2021.3062721 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_9366342</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9366342</ieee_id><sourcerecordid>2503499814</sourcerecordid><originalsourceid>FETCH-LOGICAL-c293t-ec23847d2ab0e03c0dd0986edb8dddf4f2e3cf154df37d7d369cfaf57ed164d83</originalsourceid><addsrcrecordid>eNo9kEtLAzEUhYMoWKs_QNwMuJ6a58xkKfUJhRZa6TKkyY2mzMvMjOC_N8MUV_dwOOdc-BC6JXhBCJYPu83TdkExJQuGM5pTcoZmRIgipaRg51FjLlJJibxEV113xJhwgfkM7dc_EErdtr7-TJZNVQ21N7r3TZ3sff81Wu3QT4avk40OuoIeQrKFEJuJa6I0utSHEpKnVbIL2tdx6xpdOF12cHO6c_Tx8rxbvqWr9ev78nGVGipZn4KhrOC5pfqAATODrcWyyMAeCmut444CM44Ibh3LbW5ZJo3TTuRgScZtweboftptQ_M9QNerYzOEOr5UVGDGpSwIjykypUxoui6AU23wlQ6_imA18lMjPzXyUyd-sXM3dTwA_OclyzLGKfsDdq1tUw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2503499814</pqid></control><display><type>article</type><title>Overlapping Communication With Computation in Parameter Server for Scalable DL Training</title><source>IEEE Electronic Library (IEL)</source><creator>Wang, Shaoqi ; Pi, Aidi ; Zhou, Xiaobo ; Wang, Jun ; Xu, Cheng-Zhong</creator><creatorcontrib>Wang, Shaoqi ; Pi, Aidi ; Zhou, Xiaobo ; Wang, Jun ; Xu, Cheng-Zhong</creatorcontrib><description>Scalability of distributed deep learning (DL) training with parameter server (PS) architecture is often communication constrained in large clusters. There are recent efforts that use a layer by layer strategy to overlap gradient communication with backward computation so as to reduce the impact of communication constraint on the scalability. However, the approaches could bring significant overhead in gradient communication. Meanwhile, they cannot be effectively applied to the overlap between parameter communication and forward computation. In this article, we propose and develop iPart, a novel approach that partitions communication and computation in various partition sizes to overlap gradient communication with backward computation and parameter communication with forward computation. iPart formulates the partitioning decision as an optimization problem and solves it based on a greedy algorithm to derive communication and computation partitions. We implement iPart in the open-source DL framework BigDL and perform evaluations with various DL workloads. Experimental results show that iPart improves the scalability of a cluster of 72 nodes by up to 94 percent over the default PS and 52 percent over the layer by layer strategy.</description><identifier>ISSN: 1045-9219</identifier><identifier>EISSN: 1558-2183</identifier><identifier>DOI: 10.1109/TPDS.2021.3062721</identifier><identifier>CODEN: ITDSEO</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>backward computation ; Communication ; Computation ; Computational modeling ; Computer architecture ; Constraints ; forward computation ; gradient communication ; Greedy algorithms ; Machine learning ; Neural networks ; Optimization ; parameter communication ; Parameter server ; Parameters ; Partitions ; Scalability ; Servers ; Synchronization ; Training</subject><ispartof>IEEE transactions on parallel and distributed systems, 2021-09, Vol.32 (9), p.2144-2159</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c293t-ec23847d2ab0e03c0dd0986edb8dddf4f2e3cf154df37d7d369cfaf57ed164d83</citedby><cites>FETCH-LOGICAL-c293t-ec23847d2ab0e03c0dd0986edb8dddf4f2e3cf154df37d7d369cfaf57ed164d83</cites><orcidid>0000-0002-8187-779X ; 0000-0002-0466-7609 ; 0000-0001-9480-0356</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9366342$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9366342$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Wang, Shaoqi</creatorcontrib><creatorcontrib>Pi, Aidi</creatorcontrib><creatorcontrib>Zhou, Xiaobo</creatorcontrib><creatorcontrib>Wang, Jun</creatorcontrib><creatorcontrib>Xu, Cheng-Zhong</creatorcontrib><title>Overlapping Communication With Computation in Parameter Server for Scalable DL Training</title><title>IEEE transactions on parallel and distributed systems</title><addtitle>TPDS</addtitle><description>Scalability of distributed deep learning (DL) training with parameter server (PS) architecture is often communication constrained in large clusters. There are recent efforts that use a layer by layer strategy to overlap gradient communication with backward computation so as to reduce the impact of communication constraint on the scalability. However, the approaches could bring significant overhead in gradient communication. Meanwhile, they cannot be effectively applied to the overlap between parameter communication and forward computation. In this article, we propose and develop iPart, a novel approach that partitions communication and computation in various partition sizes to overlap gradient communication with backward computation and parameter communication with forward computation. iPart formulates the partitioning decision as an optimization problem and solves it based on a greedy algorithm to derive communication and computation partitions. We implement iPart in the open-source DL framework BigDL and perform evaluations with various DL workloads. Experimental results show that iPart improves the scalability of a cluster of 72 nodes by up to 94 percent over the default PS and 52 percent over the layer by layer strategy.</description><subject>backward computation</subject><subject>Communication</subject><subject>Computation</subject><subject>Computational modeling</subject><subject>Computer architecture</subject><subject>Constraints</subject><subject>forward computation</subject><subject>gradient communication</subject><subject>Greedy algorithms</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>parameter communication</subject><subject>Parameter server</subject><subject>Parameters</subject><subject>Partitions</subject><subject>Scalability</subject><subject>Servers</subject><subject>Synchronization</subject><subject>Training</subject><issn>1045-9219</issn><issn>1558-2183</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kEtLAzEUhYMoWKs_QNwMuJ6a58xkKfUJhRZa6TKkyY2mzMvMjOC_N8MUV_dwOOdc-BC6JXhBCJYPu83TdkExJQuGM5pTcoZmRIgipaRg51FjLlJJibxEV113xJhwgfkM7dc_EErdtr7-TJZNVQ21N7r3TZ3sff81Wu3QT4avk40OuoIeQrKFEJuJa6I0utSHEpKnVbIL2tdx6xpdOF12cHO6c_Tx8rxbvqWr9ev78nGVGipZn4KhrOC5pfqAATODrcWyyMAeCmut444CM44Ibh3LbW5ZJo3TTuRgScZtweboftptQ_M9QNerYzOEOr5UVGDGpSwIjykypUxoui6AU23wlQ6_imA18lMjPzXyUyd-sXM3dTwA_OclyzLGKfsDdq1tUw</recordid><startdate>20210901</startdate><enddate>20210901</enddate><creator>Wang, Shaoqi</creator><creator>Pi, Aidi</creator><creator>Zhou, Xiaobo</creator><creator>Wang, Jun</creator><creator>Xu, Cheng-Zhong</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-8187-779X</orcidid><orcidid>https://orcid.org/0000-0002-0466-7609</orcidid><orcidid>https://orcid.org/0000-0001-9480-0356</orcidid></search><sort><creationdate>20210901</creationdate><title>Overlapping Communication With Computation in Parameter Server for Scalable DL Training</title><author>Wang, Shaoqi ; Pi, Aidi ; Zhou, Xiaobo ; Wang, Jun ; Xu, Cheng-Zhong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c293t-ec23847d2ab0e03c0dd0986edb8dddf4f2e3cf154df37d7d369cfaf57ed164d83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>backward computation</topic><topic>Communication</topic><topic>Computation</topic><topic>Computational modeling</topic><topic>Computer architecture</topic><topic>Constraints</topic><topic>forward computation</topic><topic>gradient communication</topic><topic>Greedy algorithms</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>parameter communication</topic><topic>Parameter server</topic><topic>Parameters</topic><topic>Partitions</topic><topic>Scalability</topic><topic>Servers</topic><topic>Synchronization</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Shaoqi</creatorcontrib><creatorcontrib>Pi, Aidi</creatorcontrib><creatorcontrib>Zhou, Xiaobo</creatorcontrib><creatorcontrib>Wang, Jun</creatorcontrib><creatorcontrib>Xu, Cheng-Zhong</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on parallel and distributed systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wang, Shaoqi</au><au>Pi, Aidi</au><au>Zhou, Xiaobo</au><au>Wang, Jun</au><au>Xu, Cheng-Zhong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Overlapping Communication With Computation in Parameter Server for Scalable DL Training</atitle><jtitle>IEEE transactions on parallel and distributed systems</jtitle><stitle>TPDS</stitle><date>2021-09-01</date><risdate>2021</risdate><volume>32</volume><issue>9</issue><spage>2144</spage><epage>2159</epage><pages>2144-2159</pages><issn>1045-9219</issn><eissn>1558-2183</eissn><coden>ITDSEO</coden><abstract>Scalability of distributed deep learning (DL) training with parameter server (PS) architecture is often communication constrained in large clusters. There are recent efforts that use a layer by layer strategy to overlap gradient communication with backward computation so as to reduce the impact of communication constraint on the scalability. However, the approaches could bring significant overhead in gradient communication. Meanwhile, they cannot be effectively applied to the overlap between parameter communication and forward computation. In this article, we propose and develop iPart, a novel approach that partitions communication and computation in various partition sizes to overlap gradient communication with backward computation and parameter communication with forward computation. iPart formulates the partitioning decision as an optimization problem and solves it based on a greedy algorithm to derive communication and computation partitions. We implement iPart in the open-source DL framework BigDL and perform evaluations with various DL workloads. Experimental results show that iPart improves the scalability of a cluster of 72 nodes by up to 94 percent over the default PS and 52 percent over the layer by layer strategy.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TPDS.2021.3062721</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0002-8187-779X</orcidid><orcidid>https://orcid.org/0000-0002-0466-7609</orcidid><orcidid>https://orcid.org/0000-0001-9480-0356</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1045-9219 |
ispartof | IEEE transactions on parallel and distributed systems, 2021-09, Vol.32 (9), p.2144-2159 |
issn | 1045-9219 1558-2183 |
language | eng |
recordid | cdi_ieee_primary_9366342 |
source | IEEE Electronic Library (IEL) |
subjects | backward computation Communication Computation Computational modeling Computer architecture Constraints forward computation gradient communication Greedy algorithms Machine learning Neural networks Optimization parameter communication Parameter server Parameters Partitions Scalability Servers Synchronization Training |
title | Overlapping Communication With Computation in Parameter Server for Scalable DL Training |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T09%3A23%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Overlapping%20Communication%20With%20Computation%20in%20Parameter%20Server%20for%20Scalable%20DL%20Training&rft.jtitle=IEEE%20transactions%20on%20parallel%20and%20distributed%20systems&rft.au=Wang,%20Shaoqi&rft.date=2021-09-01&rft.volume=32&rft.issue=9&rft.spage=2144&rft.epage=2159&rft.pages=2144-2159&rft.issn=1045-9219&rft.eissn=1558-2183&rft.coden=ITDSEO&rft_id=info:doi/10.1109/TPDS.2021.3062721&rft_dat=%3Cproquest_RIE%3E2503499814%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2503499814&rft_id=info:pmid/&rft_ieee_id=9366342&rfr_iscdi=true |