Accelerating Proximal Gradient Descent via Silver Stepsizes
Surprisingly, recent work has shown that gradient descent can be accelerated without using momentum -- just by judiciously choosing stepsizes. An open question raised by several papers is whether this phenomenon of stepsize-based acceleration holds more generally for constrained and/or composite con...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Surprisingly, recent work has shown that gradient descent can be accelerated
without using momentum -- just by judiciously choosing stepsizes. An open
question raised by several papers is whether this phenomenon of stepsize-based
acceleration holds more generally for constrained and/or composite convex
optimization via projected and/or proximal versions of gradient descent. We
answer this in the affirmative by proving that the silver stepsize schedule
yields analogously accelerated rates in these settings. These rates are
conjectured to be asymptotically optimal among all stepsize schedules, and
match the silver convergence rate of vanilla gradient descent (Altschuler and
Parrilo, 2023), namely $O(\varepsilon^{- \log_{\rho} 2})$ for smooth convex
optimization and $O(\kappa^{\log_\rho 2} \log \frac{1}{\varepsilon})$ under
strong convexity, where $\varepsilon$ is the precision, $\kappa$ is the
condition number, and $\rho = 1 + \sqrt{2}$ is the silver ratio. The key
technical insight is the combination of recursive gluing -- the technique
underlying all analyses of gradient descent accelerated with time-varying
stepsizes -- with a certain Laplacian-structured sum-of-squares certificate for
the analysis of proximal point updates. |
---|---|
DOI: | 10.48550/arxiv.2412.05497 |