Measuring the Effects of Data Parallelism on Neural Network Training
Journal of Machine Learning Research 20 (2019) 1-49 Recent hardware developments have dramatically increased the scale of data parallelism available for neural network training. Among the simplest ways to harness next-generation hardware is to increase the batch size in standard mini-batch neural ne...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Journal of Machine Learning Research 20 (2019) 1-49 Recent hardware developments have dramatically increased the scale of data
parallelism available for neural network training. Among the simplest ways to
harness next-generation hardware is to increase the batch size in standard
mini-batch neural network training algorithms. In this work, we aim to
experimentally characterize the effects of increasing the batch size on
training time, as measured by the number of steps necessary to reach a goal
out-of-sample error. We study how this relationship varies with the training
algorithm, model, and data set, and find extremely large variation between
workloads. Along the way, we show that disagreements in the literature on how
batch size affects model quality can largely be explained by differences in
metaparameter tuning and compute budgets at different batch sizes. We find no
evidence that larger batch sizes degrade out-of-sample performance. Finally, we
discuss the implications of our results on efforts to train neural networks
much faster in the future. Our experimental data is publicly available as a
database of 71,638,836 loss measurements taken over the course of training for
168,160 individual models across 35 workloads. |
---|---|
DOI: | 10.48550/arxiv.1811.03600 |