Understanding the Role of Momentum in Stochastic Gradient Methods
The use of momentum in stochastic gradient methods has become a widespread practice in machine learning. Different variants of momentum, including heavy-ball momentum, Nesterov's accelerated gradient (NAG), and quasi-hyperbolic momentum (QHM), have demonstrated success on various tasks. Despite...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The use of momentum in stochastic gradient methods has become a widespread
practice in machine learning. Different variants of momentum, including
heavy-ball momentum, Nesterov's accelerated gradient (NAG), and
quasi-hyperbolic momentum (QHM), have demonstrated success on various tasks.
Despite these empirical successes, there is a lack of clear understanding of
how the momentum parameters affect convergence and various performance measures
of different algorithms. In this paper, we use the general formulation of QHM
to give a unified analysis of several popular algorithms, covering their
asymptotic convergence conditions, stability regions, and properties of their
stationary distributions. In addition, by combining the results on convergence
rates and stationary distributions, we obtain sometimes counter-intuitive
practical guidelines for setting the learning rate and momentum parameters. |
---|---|
DOI: | 10.48550/arxiv.1910.13962 |