Stochastic momentum methods for non-convex learning without bounded assumptions

Stochastic momentum methods are widely used to solve stochastic optimization problems in machine learning. However, most of the existing theoretical analyses rely on either bounded assumptions or strong stepsize conditions. In this paper, we focus on a class of non-convex objective functions satisfy...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neural networks 2023-08, Vol.165, p.830-845
Hauptverfasser: Liang, Yuqing, Liu, Jinlan, Xu, Dongpo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Stochastic momentum methods are widely used to solve stochastic optimization problems in machine learning. However, most of the existing theoretical analyses rely on either bounded assumptions or strong stepsize conditions. In this paper, we focus on a class of non-convex objective functions satisfying the Polyak–Łojasiewicz (PL) condition and present a unified convergence rate analysis for stochastic momentum methods without any bounded assumptions, which covers stochastic heavy ball (SHB) and stochastic Nesterov accelerated gradient (SNAG). Our analysis achieves the more challenging last-iterate convergence rate of function values under the relaxed growth (RG) condition, which is a weaker assumption than those used in related work. Specifically, we attain the sub-linear rate for stochastic momentum methods with diminishing stepsizes, and the linear convergence rate for constant stepsizes if the strong growth (SG) condition holds. We also examine the iteration complexity for obtaining an ϵ-accurate solution of the last-iterate. Moreover, we provide a more flexible stepsize scheme for stochastic momentum methods in three points: (i) relaxing the last-iterate convergence stepsize from square summable to zero limitation; (ii) extending the minimum-iterate convergence rate stepsize to the non-monotonic case; (iii) expanding the last-iterate convergence rate stepsize to a more general form. Finally, we conduct numerical experiments on benchmark datasets to validate our theoretical findings.
ISSN:0893-6080
1879-2782
DOI:10.1016/j.neunet.2023.06.021