"Short-Dot": Computing Large Linear Transforms Distributedly Using Coded Short Dot Products

We consider the problem of computing a matrix-vector product Ax using a set of P parallel or distributed processing nodes prone to "straggling," i.e. , unpredictable delays. Every processing node can access only a fraction ({s}/{N}) of the N -length vector x , and all processing no...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on information theory 2019-10, Vol.65 (10), p.6171-6193
Hauptverfasser:	Dutta, Sanghamitra, Cadambe, Viveck, Grover, Pulkit
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithm-based fault tolerance coded computing Delays Distributed processing Encoding Error correcting codes Error correction Linear transformations Machine learning Mathematical analysis Matrix algebra Matrix methods matrix sparsification Nodes Replication Sensors stragglers Task analysis Transforms
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	6193
container_issue	10
container_start_page	6171
container_title	IEEE transactions on information theory
container_volume	65
creator	Dutta, Sanghamitra Cadambe, Viveck Grover, Pulkit
description	We consider the problem of computing a matrix-vector product Ax using a set of P parallel or distributed processing nodes prone to "straggling," i.e. , unpredictable delays. Every processing node can access only a fraction ({s}/{N}) of the N -length vector x , and all processing nodes compute an equal number of dot products. We propose a novel error correcting code-that we call "Short-Dot"-that introduces redundant, shorter dot products such that only a subset of the nodes' outputs are sufficient to compute Ax . To address the problem of straggling in computing matrix-vector products, prior work uses replication or erasure coding to encode parts of the matrix A , but the length of the dot products computed at each processing node is still N . The key novelty in our work is that instead of computing the long dot products as required in the original matrix-vector product, we construct a larger number of redundant and short dot products that only require a fraction of x to be accessed during the computation. Short-Dot is thus useful in a communication-constrained scenario as it allows for only a fraction of x to be accessed by each processing node. Further, we show that in the particular regime where the number of available processing nodes is greater than the total number of dot products, Short-Dot has lower expected computation time under straggling under an exponential model compared to existing strategies, e.g. replication, in a scaling sense. We also derive fundamental limits on the trade-off between the length of the dot products and the recovery threshold, i.e., the required number of processing nodes, showing that Short-Dot is near-optimal.
doi_str_mv	10.1109/TIT.2019.2927558
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2292980150</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8758338</ieee_id><sourcerecordid>2292980150</sourcerecordid><originalsourceid>FETCH-LOGICAL-c333t-3826fb3ac90f81189aa756d9473d4214153f1570caf727acde6e384fb1fee42e3</originalsourceid><addsrcrecordid>eNo9kEtLAzEURoMoWKt7wU2o66l5ThJ3MvVRGFBwunIxpDNJndJOapJZ9N-b2uLqcrnn-y4cAG4xmmKM1EM1r6YEYTUligjO5RkYYc5FpnLOzsEIISwzxZi8BFchrNPKOCYj8DX5_HY-ZjMXJ4-wcNvdELt-BUvtVwaWXW-0h5XXfbDObwOcdSH6bjlE0272cBEObOFa08K_Hph64Id37dDEcA0urN4Ec3OaY7B4ea6Kt6x8f50XT2XWUEpjRiXJ7ZLqRiErMZZKa8HzVjFBW0Yww5xazAVqtBVE6KY1uaGS2SW2xjBi6BjcH3t33v0MJsR67Qbfp5c1STaURJijRKEj1XgXgje23vluq_2-xqg-KKyTwvqgsD4pTJG7Y6QzxvzjUqQTlfQX8Q9sZA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2292980150</pqid></control><display><type>article</type><title>"Short-Dot": Computing Large Linear Transforms Distributedly Using Coded Short Dot Products</title><source>IEEE Electronic Library (IEL)</source><creator>Dutta, Sanghamitra ; Cadambe, Viveck ; Grover, Pulkit</creator><creatorcontrib>Dutta, Sanghamitra ; Cadambe, Viveck ; Grover, Pulkit</creatorcontrib><description><![CDATA[We consider the problem of computing a matrix-vector product <inline-formula> <tex-math notation="LaTeX">Ax </tex-math></inline-formula> using a set of <inline-formula> <tex-math notation="LaTeX">P </tex-math></inline-formula> parallel or distributed processing nodes prone to "straggling," i.e. , unpredictable delays. Every processing node can access only a fraction <inline-formula> <tex-math notation="LaTeX">({s}/{N}) </tex-math></inline-formula> of the <inline-formula> <tex-math notation="LaTeX">N </tex-math></inline-formula>-length vector <inline-formula> <tex-math notation="LaTeX">x </tex-math></inline-formula>, and all processing nodes compute an equal number of dot products. We propose a novel error correcting code-that we call "Short-Dot"-that introduces redundant, shorter dot products such that only a subset of the nodes' outputs are sufficient to compute <inline-formula> <tex-math notation="LaTeX">Ax </tex-math></inline-formula>. To address the problem of straggling in computing matrix-vector products, prior work uses replication or erasure coding to encode parts of the matrix <inline-formula> <tex-math notation="LaTeX">A </tex-math></inline-formula>, but the length of the dot products computed at each processing node is still <inline-formula> <tex-math notation="LaTeX">N </tex-math></inline-formula>. The key novelty in our work is that instead of computing the long dot products as required in the original matrix-vector product, we construct a larger number of redundant and short dot products that only require a fraction of <inline-formula> <tex-math notation="LaTeX">x </tex-math></inline-formula> to be accessed during the computation. Short-Dot is thus useful in a communication-constrained scenario as it allows for only a fraction of <inline-formula> <tex-math notation="LaTeX">x </tex-math></inline-formula> to be accessed by each processing node. Further, we show that in the particular regime where the number of available processing nodes is greater than the total number of dot products, Short-Dot has lower expected computation time under straggling under an exponential model compared to existing strategies, e.g. replication, in a scaling sense. We also derive fundamental limits on the trade-off between the length of the dot products and the recovery threshold, i.e., the required number of processing nodes, showing that Short-Dot is near-optimal.]]></description><identifier>ISSN: 0018-9448</identifier><identifier>EISSN: 1557-9654</identifier><identifier>DOI: 10.1109/TIT.2019.2927558</identifier><identifier>CODEN: IETTAW</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithm-based fault tolerance ; coded computing ; Delays ; Distributed processing ; Encoding ; Error correcting codes ; Error correction ; Linear transformations ; Machine learning ; Mathematical analysis ; Matrix algebra ; Matrix methods ; matrix sparsification ; Nodes ; Replication ; Sensors ; stragglers ; Task analysis ; Transforms</subject><ispartof>IEEE transactions on information theory, 2019-10, Vol.65 (10), p.6171-6193</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c333t-3826fb3ac90f81189aa756d9473d4214153f1570caf727acde6e384fb1fee42e3</citedby><cites>FETCH-LOGICAL-c333t-3826fb3ac90f81189aa756d9473d4214153f1570caf727acde6e384fb1fee42e3</cites><orcidid>0000-0001-6786-8785 ; 0000-0002-6500-2627 ; 0000-0001-7651-7776</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8758338$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8758338$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Dutta, Sanghamitra</creatorcontrib><creatorcontrib>Cadambe, Viveck</creatorcontrib><creatorcontrib>Grover, Pulkit</creatorcontrib><title>"Short-Dot": Computing Large Linear Transforms Distributedly Using Coded Short Dot Products</title><title>IEEE transactions on information theory</title><addtitle>TIT</addtitle><description><![CDATA[We consider the problem of computing a matrix-vector product <inline-formula> <tex-math notation="LaTeX">Ax </tex-math></inline-formula> using a set of <inline-formula> <tex-math notation="LaTeX">P </tex-math></inline-formula> parallel or distributed processing nodes prone to "straggling," i.e. , unpredictable delays. Every processing node can access only a fraction <inline-formula> <tex-math notation="LaTeX">({s}/{N}) </tex-math></inline-formula> of the <inline-formula> <tex-math notation="LaTeX">N </tex-math></inline-formula>-length vector <inline-formula> <tex-math notation="LaTeX">x </tex-math></inline-formula>, and all processing nodes compute an equal number of dot products. We propose a novel error correcting code-that we call "Short-Dot"-that introduces redundant, shorter dot products such that only a subset of the nodes' outputs are sufficient to compute <inline-formula> <tex-math notation="LaTeX">Ax </tex-math></inline-formula>. To address the problem of straggling in computing matrix-vector products, prior work uses replication or erasure coding to encode parts of the matrix <inline-formula> <tex-math notation="LaTeX">A </tex-math></inline-formula>, but the length of the dot products computed at each processing node is still <inline-formula> <tex-math notation="LaTeX">N </tex-math></inline-formula>. The key novelty in our work is that instead of computing the long dot products as required in the original matrix-vector product, we construct a larger number of redundant and short dot products that only require a fraction of <inline-formula> <tex-math notation="LaTeX">x </tex-math></inline-formula> to be accessed during the computation. Short-Dot is thus useful in a communication-constrained scenario as it allows for only a fraction of <inline-formula> <tex-math notation="LaTeX">x </tex-math></inline-formula> to be accessed by each processing node. Further, we show that in the particular regime where the number of available processing nodes is greater than the total number of dot products, Short-Dot has lower expected computation time under straggling under an exponential model compared to existing strategies, e.g. replication, in a scaling sense. We also derive fundamental limits on the trade-off between the length of the dot products and the recovery threshold, i.e., the required number of processing nodes, showing that Short-Dot is near-optimal.]]></description><subject>Algorithm-based fault tolerance</subject><subject>coded computing</subject><subject>Delays</subject><subject>Distributed processing</subject><subject>Encoding</subject><subject>Error correcting codes</subject><subject>Error correction</subject><subject>Linear transformations</subject><subject>Machine learning</subject><subject>Mathematical analysis</subject><subject>Matrix algebra</subject><subject>Matrix methods</subject><subject>matrix sparsification</subject><subject>Nodes</subject><subject>Replication</subject><subject>Sensors</subject><subject>stragglers</subject><subject>Task analysis</subject><subject>Transforms</subject><issn>0018-9448</issn><issn>1557-9654</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kEtLAzEURoMoWKt7wU2o66l5ThJ3MvVRGFBwunIxpDNJndJOapJZ9N-b2uLqcrnn-y4cAG4xmmKM1EM1r6YEYTUligjO5RkYYc5FpnLOzsEIISwzxZi8BFchrNPKOCYj8DX5_HY-ZjMXJ4-wcNvdELt-BUvtVwaWXW-0h5XXfbDObwOcdSH6bjlE0272cBEObOFa08K_Hph64Id37dDEcA0urN4Ec3OaY7B4ea6Kt6x8f50XT2XWUEpjRiXJ7ZLqRiErMZZKa8HzVjFBW0Yww5xazAVqtBVE6KY1uaGS2SW2xjBi6BjcH3t33v0MJsR67Qbfp5c1STaURJijRKEj1XgXgje23vluq_2-xqg-KKyTwvqgsD4pTJG7Y6QzxvzjUqQTlfQX8Q9sZA</recordid><startdate>20191001</startdate><enddate>20191001</enddate><creator>Dutta, Sanghamitra</creator><creator>Cadambe, Viveck</creator><creator>Grover, Pulkit</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-6786-8785</orcidid><orcidid>https://orcid.org/0000-0002-6500-2627</orcidid><orcidid>https://orcid.org/0000-0001-7651-7776</orcidid></search><sort><creationdate>20191001</creationdate><title>"Short-Dot": Computing Large Linear Transforms Distributedly Using Coded Short Dot Products</title><author>Dutta, Sanghamitra ; Cadambe, Viveck ; Grover, Pulkit</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c333t-3826fb3ac90f81189aa756d9473d4214153f1570caf727acde6e384fb1fee42e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Algorithm-based fault tolerance</topic><topic>coded computing</topic><topic>Delays</topic><topic>Distributed processing</topic><topic>Encoding</topic><topic>Error correcting codes</topic><topic>Error correction</topic><topic>Linear transformations</topic><topic>Machine learning</topic><topic>Mathematical analysis</topic><topic>Matrix algebra</topic><topic>Matrix methods</topic><topic>matrix sparsification</topic><topic>Nodes</topic><topic>Replication</topic><topic>Sensors</topic><topic>stragglers</topic><topic>Task analysis</topic><topic>Transforms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dutta, Sanghamitra</creatorcontrib><creatorcontrib>Cadambe, Viveck</creatorcontrib><creatorcontrib>Grover, Pulkit</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on information theory</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Dutta, Sanghamitra</au><au>Cadambe, Viveck</au><au>Grover, Pulkit</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>"Short-Dot": Computing Large Linear Transforms Distributedly Using Coded Short Dot Products</atitle><jtitle>IEEE transactions on information theory</jtitle><stitle>TIT</stitle><date>2019-10-01</date><risdate>2019</risdate><volume>65</volume><issue>10</issue><spage>6171</spage><epage>6193</epage><pages>6171-6193</pages><issn>0018-9448</issn><eissn>1557-9654</eissn><coden>IETTAW</coden><abstract><![CDATA[We consider the problem of computing a matrix-vector product <inline-formula> <tex-math notation="LaTeX">Ax </tex-math></inline-formula> using a set of <inline-formula> <tex-math notation="LaTeX">P </tex-math></inline-formula> parallel or distributed processing nodes prone to "straggling," i.e. , unpredictable delays. Every processing node can access only a fraction <inline-formula> <tex-math notation="LaTeX">({s}/{N}) </tex-math></inline-formula> of the <inline-formula> <tex-math notation="LaTeX">N </tex-math></inline-formula>-length vector <inline-formula> <tex-math notation="LaTeX">x </tex-math></inline-formula>, and all processing nodes compute an equal number of dot products. We propose a novel error correcting code-that we call "Short-Dot"-that introduces redundant, shorter dot products such that only a subset of the nodes' outputs are sufficient to compute <inline-formula> <tex-math notation="LaTeX">Ax </tex-math></inline-formula>. To address the problem of straggling in computing matrix-vector products, prior work uses replication or erasure coding to encode parts of the matrix <inline-formula> <tex-math notation="LaTeX">A </tex-math></inline-formula>, but the length of the dot products computed at each processing node is still <inline-formula> <tex-math notation="LaTeX">N </tex-math></inline-formula>. The key novelty in our work is that instead of computing the long dot products as required in the original matrix-vector product, we construct a larger number of redundant and short dot products that only require a fraction of <inline-formula> <tex-math notation="LaTeX">x </tex-math></inline-formula> to be accessed during the computation. Short-Dot is thus useful in a communication-constrained scenario as it allows for only a fraction of <inline-formula> <tex-math notation="LaTeX">x </tex-math></inline-formula> to be accessed by each processing node. Further, we show that in the particular regime where the number of available processing nodes is greater than the total number of dot products, Short-Dot has lower expected computation time under straggling under an exponential model compared to existing strategies, e.g. replication, in a scaling sense. We also derive fundamental limits on the trade-off between the length of the dot products and the recovery threshold, i.e., the required number of processing nodes, showing that Short-Dot is near-optimal.]]></abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TIT.2019.2927558</doi><tpages>23</tpages><orcidid>https://orcid.org/0000-0001-6786-8785</orcidid><orcidid>https://orcid.org/0000-0002-6500-2627</orcidid><orcidid>https://orcid.org/0000-0001-7651-7776</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0018-9448
ispartof	IEEE transactions on information theory, 2019-10, Vol.65 (10), p.6171-6193
issn	0018-9448 1557-9654
language	eng
recordid	cdi_proquest_journals_2292980150
source	IEEE Electronic Library (IEL)
subjects	Algorithm-based fault tolerance coded computing Delays Distributed processing Encoding Error correcting codes Error correction Linear transformations Machine learning Mathematical analysis Matrix algebra Matrix methods matrix sparsification Nodes Replication Sensors stragglers Task analysis Transforms
title	"Short-Dot": Computing Large Linear Transforms Distributedly Using Coded Short Dot Products
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T09%3A02%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=%22Short-Dot%22:%20Computing%20Large%20Linear%20Transforms%20Distributedly%20Using%20Coded%20Short%20Dot%20Products&rft.jtitle=IEEE%20transactions%20on%20information%20theory&rft.au=Dutta,%20Sanghamitra&rft.date=2019-10-01&rft.volume=65&rft.issue=10&rft.spage=6171&rft.epage=6193&rft.pages=6171-6193&rft.issn=0018-9448&rft.eissn=1557-9654&rft.coden=IETTAW&rft_id=info:doi/10.1109/TIT.2019.2927558&rft_dat=%3Cproquest_RIE%3E2292980150%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2292980150&rft_id=info:pmid/&rft_ieee_id=8758338&rfr_iscdi=true