Coded Sequential Matrix Multiplication for Straggler Mitigation

In this work, we consider a sequence of J matrix multiplication jobs which needs to be distributed by a master across multiple worker nodes. For i\in \{1,2,\ldots,J\} , job- i begins in round- i and has to be completed by round- (i+T) . In order to provide resiliency against slow workers (strag...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE journal on selected areas in information theory 2021-09, Vol.2 (3), p.830-844
Hauptverfasser:	Krishnan, M. Nikhil, Hosseini, Erfan, Khisti, Ashish
Format:	Artikel
Sprache:	eng
Schlagworte:	Coding Computational modeling Delays Distributed computation Encoding Encoding-Decoding erasure coding matrix multiplication Multiplication Neural networks Performance enhancement polynomial codes Polynomials Redundancy Resilience Task analysis Training Workers
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this work, we consider a sequence of J matrix multiplication jobs which needs to be distributed by a master across multiple worker nodes. For i\in \{1,2,\ldots,J\} , job- i begins in round- i and has to be completed by round- (i+T) . In order to provide resiliency against slow workers (stragglers), previous works focus on coding across workers, which is the special case of T=0 . We propose here two schemes with T > 0 , which allow for coding across workers as well as the dimension of time. Our first scheme is a modification of the polynomial coding scheme introduced by Yu et al. and places no assumptions on the straggler model. Exploitation of the temporal dimension helps the scheme handle a larger set of straggler patterns than the polynomial coding scheme, for a given computational load per worker per round. The second scheme assumes a particular straggler model to further improve performance (in terms of encoding/decoding complexity). We develop theoretical results establishing (i) optimality of our proposed schemes for certain classes of straggler patterns and (ii) improved performance for the case of i.i.d. stragglers. These are further validated by experiments, where we implement our schemes to train neural networks.
ISSN:	2641-8770 2641-8770
DOI:	10.1109/JSAIT.2021.3104970