Echo-CGC: A Communication-Efficient Byzantine-tolerant Distributed Machine Learning Algorithm in Single-Hop Radio Network
In this paper, we focus on a popular DML framework -- the parameter server computation paradigm and iterative learning algorithms that proceed in rounds. We aim to reduce the communication complexity of Byzantine-tolerant DML algorithms in the single-hop radio network. Inspired by the CGC filter dev...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper, we focus on a popular DML framework -- the parameter server
computation paradigm and iterative learning algorithms that proceed in rounds.
We aim to reduce the communication complexity of Byzantine-tolerant DML
algorithms in the single-hop radio network. Inspired by the CGC filter
developed by Gupta and Vaidya, PODC 2020, we propose a gradient descent-based
algorithm, Echo-CGC. Our main novelty is a mechanism to utilize the broadcast
properties of the radio network to avoid transmitting the raw gradients (full
$d$-dimensional vectors). In the radio network, each worker is able to overhear
previous gradients that were transmitted to the parameter server. Roughly
speaking, in Echo-CGC, if a worker "agrees" with a combination of prior
gradients, it will broadcast the "echo message" instead of the its raw local
gradient. The echo message contains a vector of coefficients (of size at most
$n$) and the ratio of the magnitude between two gradients (a float). In
comparison, the traditional approaches need to send $n$ local gradients in each
round, where each gradient is typically a vector in an ultra-high dimensional
space ($d\gg n$). The improvement on communication complexity of our algorithm
depends on multiple factors, including number of nodes, number of faulty
workers in an execution, and the cost function. We numerically analyze the
improvement, and show that with a large number of nodes, Echo-CGC reduces
$80\%$ of the communication under standard assumptions. |
---|---|
DOI: | 10.48550/arxiv.2011.07447 |