3LC: Lightweight and Effective Traffic Compression for Distributed Machine Learning
The performance and efficiency of distributed machine learning (ML) depends significantly on how long it takes for nodes to exchange state changes. Overly-aggressive attempts to reduce communication often sacrifice final model accuracy and necessitate additional ML techniques to compensate for this...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The performance and efficiency of distributed machine learning (ML) depends
significantly on how long it takes for nodes to exchange state changes.
Overly-aggressive attempts to reduce communication often sacrifice final model
accuracy and necessitate additional ML techniques to compensate for this loss,
limiting their generality. Some attempts to reduce communication incur high
computation overhead, which makes their performance benefits visible only over
slow networks.
We present 3LC, a lossy compression scheme for state change traffic that
strikes balance between multiple goals: traffic reduction, accuracy,
computation overhead, and generality. It combines three new
techniques---3-value quantization with sparsity multiplication, quartic
encoding, and zero-run encoding---to leverage strengths of quantization and
sparsification techniques and avoid their drawbacks. It achieves a data
compression ratio of up to 39--107X, almost the same test accuracy of trained
models, and high compression speed. Distributed ML frameworks can employ 3LC
without modifications to existing ML algorithms. Our experiments show that 3LC
reduces wall-clock training time of ResNet-110--based image classifiers for
CIFAR-10 on a 10-GPU cluster by up to 16--23X compared to TensorFlow's baseline
design. |
---|---|
DOI: | 10.48550/arxiv.1802.07389 |