Heterogeneity-Aware Gradient Coding for Tolerating and Leveraging Stragglers

Distributed gradient descent has been widely adopted in the machine learning field because considerable computing resources are available when facing the huge volume of data. Specifically, the gradient over the whole data is cooperatively computed by multiple workers. However, its performance can be...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on computers 2022-04, Vol.71 (4), p.779-794
Hauptverfasser:	Wang, Haozhao, Guo, Song, Tang, Bin, Li, Ruixuan, Yang, Yutong, Qu, Zhihao, Wang, Yi
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Clustering algorithms Coding Computation Convergence Distributed architecture distributed systems Encoding gradient coding Heterogeneity Machine learning Partitioning algorithms Task analysis Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Distributed gradient descent has been widely adopted in the machine learning field because considerable computing resources are available when facing the huge volume of data. Specifically, the gradient over the whole data is cooperatively computed by multiple workers. However, its performance can be severely affected by slow workers, namely stragglers. Recently, coding-based approaches have been introduced to mitigate the straggler problem, but they could hardly deal with the heterogeneity among workers. Besides, they always discard the results of stragglers causing huge resource waste. In this article, we first investigate how to tolerate stragglers by discarding their results and then seek to leverage the stragglers. For tolerating stragglers, we propose a heterogeneity-aware coding scheme that encodes gradients adaptive to the computing capability of workers. Theoretically, this scheme is optimal for stragglers tolerance. Relying on the scheme, we further propose an algorithm called DHeter-aware to exploit the gradients of stragglers which we called delayed gradients. Moreover, theoretical results characterized for DHeter-aware exhibits the same convergence rate as the gradient descent without delayed gradients. Experiments on various tasks and clusters demonstrate that our coding scheme outperforms all the state-of-the-art methods and the DHeter-aware further accelerates the coding scheme by achieving 25 percent time savings.
ISSN:	0018-9340 1557-9956
DOI:	10.1109/TC.2021.3063180