Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms
Recently, there is a growing interest in the study of median-based algorithms for distributed non-convex optimization. Two prominent such algorithms include signSGD with majority vote, an effective approach for communication reduction via 1-bit compression on the local gradients, and medianSGD, an a...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recently, there is a growing interest in the study of median-based algorithms
for distributed non-convex optimization. Two prominent such algorithms include
signSGD with majority vote, an effective approach for communication reduction
via 1-bit compression on the local gradients, and medianSGD, an algorithm
recently proposed to ensure robustness against Byzantine workers. The
convergence analyses for these algorithms critically rely on the assumption
that all the distributed data are drawn iid from the same distribution.
However, in applications such as Federated Learning, the data across different
nodes or machines can be inherently heterogeneous, which violates such an iid
assumption. This work analyzes signSGD and medianSGD in distributed settings
with heterogeneous data. We show that these algorithms are non-convergent
whenever there is some disparity between the expected median and mean over the
local gradients. To overcome this gap, we provide a novel gradient correction
mechanism that perturbs the local gradients with noise, together with a series
results that provable close the gap between mean and median of the gradients.
The proposed methods largely preserve nice properties of these methods, such as
the low per-iteration communication complexity of signSGD, and further enjoy
global convergence to stationary solutions. Our perturbation technique can be
of independent interest when one wishes to estimate mean through a median
estimator. |
---|---|
DOI: | 10.48550/arxiv.1906.01736 |