Theory of Deep Learning IIb: Optimization Properties of SGD
In Theory IIb we characterize with a mix of theory and experiments the optimization of deep convolutional networks by Stochastic Gradient Descent. The main new result in this paper is theoretical and experimental evidence for the following conjecture about SGD: SGD concentrates in probability -- lik...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In Theory IIb we characterize with a mix of theory and experiments the
optimization of deep convolutional networks by Stochastic Gradient Descent. The
main new result in this paper is theoretical and experimental evidence for the
following conjecture about SGD: SGD concentrates in probability -- like the
classical Langevin equation -- on large volume, "flat" minima, selecting flat
minimizers which are with very high probability also global minimizers |
---|---|
DOI: | 10.48550/arxiv.1801.02254 |