Stochastic Gradient Methods with Preconditioned Updates

This work considers the non-convex finite-sum minimization problem. There are several algorithms for such problems, but existing methods often work poorly when the problem is badly scaled and/or ill-conditioned, and a primary goal of this work is to introduce methods that alleviate this issue. Thus,...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of optimization theory and applications 2024-05, Vol.201 (2), p.471-489
Hauptverfasser:	Sadiev, Abdurakhmon, Beznosikov, Aleksandr, Almansoori, Abdulla Jasem, Kamzolov, Dmitry, Tappenden, Rachael, Takáč, Martin
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Applications of Mathematics Approximation Artificial intelligence Calculus of Variations and Optimal Control Optimization Convergence Engineering Gradients Machine learning Mathematics Mathematics and Statistics Operations Research/Decision Theory Optimization Smoothness Stochastic programming Theory of Computation Variance
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This work considers the non-convex finite-sum minimization problem. There are several algorithms for such problems, but existing methods often work poorly when the problem is badly scaled and/or ill-conditioned, and a primary goal of this work is to introduce methods that alleviate this issue. Thus, here we include a preconditioner based on Hutchinson’s approach to approximating the diagonal of the Hessian and couple it with several gradient-based methods to give new ‘scaled’ algorithms: Scaled SARAH and Scaled L-SVRG. Theoretical complexity guarantees under smoothness assumptions are presented. We prove linear convergence when both smoothness and the PL-condition are assumed. Our adaptively scaled methods use approximate partial second-order curvature information and, therefore, can better mitigate the impact of badly scaled problems. This improved practical performance is demonstrated in the numerical experiments also presented in this work.
ISSN:	0022-3239 1573-2878
DOI:	10.1007/s10957-023-02365-3