Straggler-Resilient Federated Learning: Leveraging the Interplay Between Statistical Accuracy and System Heterogeneity
Federated Learning is a novel paradigm that involves learning from data samples distributed across a large network of clients while the data remains local. It is, however, known that federated learning is prone to multiple system challenges including system heterogeneity where clients have different...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Federated Learning is a novel paradigm that involves learning from data
samples distributed across a large network of clients while the data remains
local. It is, however, known that federated learning is prone to multiple
system challenges including system heterogeneity where clients have different
computation and communication capabilities. Such heterogeneity in clients'
computation speeds has a negative effect on the scalability of federated
learning algorithms and causes significant slow-down in their runtime due to
the existence of stragglers. In this paper, we propose a novel
straggler-resilient federated learning method that incorporates statistical
characteristics of the clients' data to adaptively select the clients in order
to speed up the learning procedure. The key idea of our algorithm is to start
the training procedure with faster nodes and gradually involve the slower nodes
in the model training once the statistical accuracy of the data corresponding
to the current participating nodes is reached. The proposed approach reduces
the overall runtime required to achieve the statistical accuracy of data of all
nodes, as the solution for each stage is close to the solution of the
subsequent stage with more samples and can be used as a warm-start. Our
theoretical results characterize the speedup gain in comparison to standard
federated benchmarks for strongly convex objectives, and our numerical
experiments also demonstrate significant speedups in wall-clock time of our
straggler-resilient method compared to federated learning benchmarks. |
---|---|
DOI: | 10.48550/arxiv.2012.14453 |