"Short-Dot": Computing Large Linear Transforms Distributedly Using Coded Short Dot Products
We consider the problem of computing a matrix-vector product Ax using a set of P parallel or distributed processing nodes prone to "straggling," i.e. , unpredictable delays. Every processing node can access only a fraction ({s}/{N}) of the N -length vector x , and all processing no...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on information theory 2019-10, Vol.65 (10), p.6171-6193 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We consider the problem of computing a matrix-vector product Ax using a set of P parallel or distributed processing nodes prone to "straggling," i.e. , unpredictable delays. Every processing node can access only a fraction ({s}/{N}) of the N -length vector x , and all processing nodes compute an equal number of dot products. We propose a novel error correcting code-that we call "Short-Dot"-that introduces redundant, shorter dot products such that only a subset of the nodes' outputs are sufficient to compute Ax . To address the problem of straggling in computing matrix-vector products, prior work uses replication or erasure coding to encode parts of the matrix A , but the length of the dot products computed at each processing node is still N . The key novelty in our work is that instead of computing the long dot products as required in the original matrix-vector product, we construct a larger number of redundant and short dot products that only require a fraction of x to be accessed during the computation. Short-Dot is thus useful in a communication-constrained scenario as it allows for only a fraction of x to be accessed by each processing node. Further, we show that in the particular regime where the number of available processing nodes is greater than the total number of dot products, Short-Dot has lower expected computation time under straggling under an exponential model compared to existing strategies, e.g. replication, in a scaling sense. We also derive fundamental limits on the trade-off between the length of the dot products and the recovery threshold, i.e., the required number of processing nodes, showing that Short-Dot is near-optimal. |
---|---|
ISSN: | 0018-9448 1557-9654 |
DOI: | 10.1109/TIT.2019.2927558 |