Gradient Descent-Ascent Provably Converges to Strict Local Minmax Equilibria with a Finite Timescale Separation

We study the role that a finite timescale separation parameter \(\tau\) has on gradient descent-ascent in two-player non-convex, non-concave zero-sum games where the learning rate of player 1 is denoted by \(\gamma_1\) and the learning rate of player 2 is defined to be \(\gamma_2=\tau\gamma_1\). Exi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2020-09
Hauptverfasser:	Tanner Fiez, Ratliff, Lillian
Format:	Artikel
Sprache:	eng
Schlagworte:	Ascent Convergence Critical point Game theory Learning Parameters Separation Time Zero sum games
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We study the role that a finite timescale separation parameter \(\tau\) has on gradient descent-ascent in two-player non-convex, non-concave zero-sum games where the learning rate of player 1 is denoted by \(\gamma_1\) and the learning rate of player 2 is defined to be \(\gamma_2=\tau\gamma_1\). Existing work analyzing the role of timescale separation in gradient descent-ascent has primarily focused on the edge cases of players sharing a learning rate (\(\tau =1\)) and the maximizing player approximately converging between each update of the minimizing player (\(\tau \rightarrow \infty\)). For the parameter choice of \(\tau=1\), it is known that the learning dynamics are not guaranteed to converge to a game-theoretically meaningful equilibria in general. In contrast, Jin et al. (2020) showed that the stable critical points of gradient descent-ascent coincide with the set of strict local minmax equilibria as \(\tau\rightarrow\infty\). In this work, we bridge the gap between past work by showing there exists a finite timescale separation parameter \(\tau^{\ast}\) such that \(x^{\ast}\) is a stable critical point of gradient descent-ascent for all \(\tau \in (\tau^{\ast}, \infty)\) if and only if it is a strict local minmax equilibrium. Moreover, we provide an explicit construction for computing \(\tau^{\ast}\) along with corresponding convergence rates and results under deterministic and stochastic gradient feedback. The convergence results we present are complemented by a non-convergence result: given a critical point \(x^{\ast}\) that is not a strict local minmax equilibrium, then there exists a finite timescale separation \(\tau_0\) such that \(x^{\ast}\) is unstable for all \(\tau\in (\tau_0, \infty)\). Finally, we empirically demonstrate on the CIFAR-10 and CelebA datasets the significant impact timescale separation has on training performance.
ISSN:	2331-8422