Three Factors Influencing Minima in SGD

We investigate the dynamical and convergent properties of stochastic gradient descent (SGD) applied to Deep Neural Networks (DNNs). Characterizing the relation between learning rate, batch size and the properties of the final minima, such as width or generalization, remains an open question. In orde...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2018-09
Hauptverfasser:	Jastrzębski, Stanisław, Zachary Kenton, Devansh Arpit, Ballas, Nicolas, Fischer, Asja, Bengio, Yoshua, Storkey, Amos
Format:	Artikel
Sprache:	eng
Schlagworte:	Covariance Differential equations Learning Learning curves Manufacturing Neural networks Schedules
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!