SSRGD: Simple Stochastic Recursive Gradient Descent for Escaping Saddle Points
We analyze stochastic gradient algorithms for optimizing nonconvex problems. In particular, our goal is to find local minima (second-order stationary points) instead of just finding first-order stationary points which may be some bad unstable saddle points. We show that a simple perturbed version of...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We analyze stochastic gradient algorithms for optimizing nonconvex problems.
In particular, our goal is to find local minima (second-order stationary
points) instead of just finding first-order stationary points which may be some
bad unstable saddle points. We show that a simple perturbed version of
stochastic recursive gradient descent algorithm (called SSRGD) can find an
$(\epsilon,\delta)$-second-order stationary point with
$\widetilde{O}(\sqrt{n}/\epsilon^2 + \sqrt{n}/\delta^4 + n/\delta^3)$
stochastic gradient complexity for nonconvex finite-sum problems. As a
by-product, SSRGD finds an $\epsilon$-first-order stationary point with
$O(n+\sqrt{n}/\epsilon^2)$ stochastic gradients. These results are almost
optimal since Fang et al. [2018] provided a lower bound
$\Omega(\sqrt{n}/\epsilon^2)$ for finding even just an $\epsilon$-first-order
stationary point. We emphasize that SSRGD algorithm for finding second-order
stationary points is as simple as for finding first-order stationary points
just by adding a uniform perturbation sometimes, while all other algorithms for
finding second-order stationary points with similar gradient complexity need to
combine with a negative-curvature search subroutine (e.g., Neon2 [Allen-Zhu and
Li, 2018]). Moreover, the simple SSRGD algorithm gets a simpler analysis.
Besides, we also extend our results from nonconvex finite-sum problems to
nonconvex online (expectation) problems, and prove the corresponding
convergence results. |
---|---|
DOI: | 10.48550/arxiv.1904.09265 |