Papaya: Practical, Private, and Scalable Federated Learning
Cross-device Federated Learning (FL) is a distributed learning paradigm with several challenges that differentiate it from traditional distributed learning, variability in the system characteristics on each device, and millions of clients coordinating with a central server being primary ones. Most F...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Cross-device Federated Learning (FL) is a distributed learning paradigm with
several challenges that differentiate it from traditional distributed learning,
variability in the system characteristics on each device, and millions of
clients coordinating with a central server being primary ones. Most FL systems
described in the literature are synchronous - they perform a synchronized
aggregation of model updates from individual clients. Scaling synchronous FL is
challenging since increasing the number of clients training in parallel leads
to diminishing returns in training speed, analogous to large-batch training.
Moreover, stragglers hinder synchronous FL training. In this work, we outline a
production asynchronous FL system design. Our work tackles the aforementioned
issues, sketches of some of the system design challenges and their solutions,
and touches upon principles that emerged from building a production FL system
for millions of clients. Empirically, we demonstrate that asynchronous FL
converges faster than synchronous FL when training across nearly one hundred
million devices. In particular, in high concurrency settings, asynchronous FL
is 5x faster and has nearly 8x less communication overhead than synchronous FL. |
---|---|
DOI: | 10.48550/arxiv.2111.04877 |