ASAP: Asynchronous Approximate Data-Parallel Computation
Emerging workloads, such as graph processing and machine learning are approximate because of the scale of data involved and the stochastic nature of the underlying algorithms. These algorithms are often distributed over multiple machines using bulk-synchronous processing (BSP) or other synchronous p...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Emerging workloads, such as graph processing and machine learning are
approximate because of the scale of data involved and the stochastic nature of
the underlying algorithms. These algorithms are often distributed over multiple
machines using bulk-synchronous processing (BSP) or other synchronous
processing paradigms such as map-reduce. However, data parallel processing
primitives such as repeated barrier and reduce operations introduce high
synchronization overheads. Hence, many existing data-processing platforms use
asynchrony and staleness to improve data-parallel job performance. Often, these
systems simply change the synchronous communication to asynchronous between the
worker nodes in the cluster. This improves the throughput of data processing
but results in poor accuracy of the final output since different workers may
progress at different speeds and process inconsistent intermediate outputs.
In this paper, we present ASAP, a model that provides asynchronous and
approximate processing semantics for data-parallel computation. ASAP provides
fine-grained worker synchronization using NOTIFY-ACK semantics that allows
independent workers to run asynchronously. ASAP also provides stochastic reduce
that provides approximate but guaranteed convergence to the same result as an
aggregated all-reduce. In our results, we show that ASAP can reduce
synchronization costs and provides 2-10X speedups in convergence and up to 10X
savings in network costs for distributed machine learning applications and
provides strong convergence guarantees. |
---|---|
DOI: | 10.48550/arxiv.1612.08608 |