Stability properties of gradient flow dynamics for the symmetric low-rank matrix factorization problem
The symmetric low-rank matrix factorization serves as a building block in many learning tasks, including matrix recovery and training of neural networks. However, despite a flurry of recent research, the dynamics of its training via non-convex factorized gradient-descent-type methods is not fully un...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The symmetric low-rank matrix factorization serves as a building block in
many learning tasks, including matrix recovery and training of neural networks.
However, despite a flurry of recent research, the dynamics of its training via
non-convex factorized gradient-descent-type methods is not fully understood
especially in the over-parameterized regime where the fitted rank is higher
than the true rank of the target matrix. To overcome this challenge, we
characterize equilibrium points of the gradient flow dynamics and examine their
local and global stability properties. To facilitate a precise global analysis,
we introduce a nonlinear change of variables that brings the dynamics into a
cascade connection of three subsystems whose structure is simpler than the
structure of the original system. We demonstrate that the Schur complement to a
principal eigenspace of the target matrix is governed by an autonomous system
that is decoupled from the rest of the dynamics. In the over-parameterized
regime, we show that this Schur complement vanishes at an $O(1/t)$ rate,
thereby capturing the slow dynamics that arises from excess parameters. We
utilize a Lyapunov-based approach to establish exponential convergence of the
other two subsystems. By decoupling the fast and slow parts of the dynamics, we
offer new insight into the shape of the trajectories associated with local
search algorithms and provide a complete characterization of the equilibrium
points and their global stability properties. Such an analysis via nonlinear
control techniques may prove useful in several related over-parameterized
problems. |
---|---|
DOI: | 10.48550/arxiv.2411.15972 |