Eigenvalue-corrected Natural Gradient Based on a New Approximation
Using second-order optimization methods for training deep neural networks (DNNs) has attracted many researchers. A recently proposed method, Eigenvalue-corrected Kronecker Factorization (EKFAC) (George et al., 2018), proposes an interpretation of viewing natural gradient update as a diagonal method,...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Using second-order optimization methods for training deep neural networks
(DNNs) has attracted many researchers. A recently proposed method,
Eigenvalue-corrected Kronecker Factorization (EKFAC) (George et al., 2018),
proposes an interpretation of viewing natural gradient update as a diagonal
method, and corrects the inaccurate re-scaling factor in the Kronecker-factored
eigenbasis. Gao et al. (2020) considers a new approximation to the natural
gradient, which approximates the Fisher information matrix (FIM) to a constant
multiplied by the Kronecker product of two matrices and keeps the trace equal
before and after the approximation. In this work, we combine the ideas of these
two methods and propose Trace-restricted Eigenvalue-corrected Kronecker
Factorization (TEKFAC). The proposed method not only corrects the inexact
re-scaling factor under the Kronecker-factored eigenbasis, but also considers
the new approximation method and the effective damping technique proposed in
Gao et al. (2020). We also discuss the differences and relationships among the
Kronecker-factored approximations. Empirically, our method outperforms SGD with
momentum, Adam, EKFAC and TKFAC on several DNNs. |
---|---|
DOI: | 10.48550/arxiv.2011.13609 |