Adaptive Gradient Coding

This paper focuses on mitigating the impact of stragglers in distributed learning system. Unlike the existing results designated for a fixed number of stragglers, we develop a new scheme called Adaptive Gradient Coding (AGC) with flexible communication cost for varying number of stragglers. Our sche...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE/ACM transactions on networking 2022-04, Vol.30 (2), p.717-734
Hauptverfasser: Cao, Hankun, Yan, Qifa, Tang, Xiaohu, Han, Guojun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper focuses on mitigating the impact of stragglers in distributed learning system. Unlike the existing results designated for a fixed number of stragglers, we develop a new scheme called Adaptive Gradient Coding (AGC) with flexible communication cost for varying number of stragglers. Our scheme gives an optimal tradeoff between computation load, straggler tolerance and communication cost by allowing workers to send multiple signals sequentially to the master. In particular, it can minimize the communication cost according to the unknown real-time number of stragglers in practical environments. In addition, we present a Group AGC (G-AGC) by combining the group idea with AGC to resist more stragglers in some situations. The numerical and simulation results demonstrate that our adaptive schemes can achieve the smallest average running time.
ISSN:1063-6692
1558-2566
DOI:10.1109/TNET.2021.3122873