Stochastic Optimization with Laggard Data Pipelines
State-of-the-art optimization is steadily shifting towards massively parallel pipelines with extremely large batch sizes. As a consequence, CPU-bound preprocessing and disk/memory/network operations have emerged as new performance bottlenecks, as opposed to hardware-accelerated gradient computations...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | State-of-the-art optimization is steadily shifting towards massively parallel
pipelines with extremely large batch sizes. As a consequence, CPU-bound
preprocessing and disk/memory/network operations have emerged as new
performance bottlenecks, as opposed to hardware-accelerated gradient
computations. In this regime, a recently proposed approach is data echoing
(Choi et al., 2019), which takes repeated gradient steps on the same batch
while waiting for fresh data to arrive from upstream. We provide the first
convergence analyses of "data-echoed" extensions of common optimization
methods, showing that they exhibit provable improvements over their synchronous
counterparts. Specifically, we show that in convex optimization with stochastic
minibatches, data echoing affords speedups on the curvature-dominated part of
the convergence rate, while maintaining the optimal statistical rate. |
---|---|
DOI: | 10.48550/arxiv.2010.13639 |