Min-Max Optimization under Delays
Delays and asynchrony are inevitable in large-scale machine-learning problems where communication plays a key role. As such, several works have extensively analyzed stochastic optimization with delayed gradients. However, as far as we are aware, no analogous theory is available for min-max optimizat...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Delays and asynchrony are inevitable in large-scale machine-learning problems
where communication plays a key role. As such, several works have extensively
analyzed stochastic optimization with delayed gradients. However, as far as we
are aware, no analogous theory is available for min-max optimization, a topic
that has gained recent popularity due to applications in adversarial
robustness, game theory, and reinforcement learning. Motivated by this gap, we
examine the performance of standard min-max optimization algorithms with
delayed gradient updates. First, we show (empirically) that even small delays
can cause prominent algorithms like Extra-gradient (\texttt{EG}) to diverge on
simple instances for which \texttt{EG} guarantees convergence in the absence of
delays. Our empirical study thus suggests the need for a careful analysis of
delayed versions of min-max optimization algorithms. Accordingly, under
suitable technical assumptions, we prove that Gradient Descent-Ascent
(\texttt{GDA}) and \texttt{EG} with delayed updates continue to guarantee
convergence to saddle points for convex-concave and strongly convex-strongly
concave settings. Our complexity bounds reveal, in a transparent manner, the
slow-down in convergence caused by delays. |
---|---|
DOI: | 10.48550/arxiv.2307.06886 |