Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC
Second-order methods such as KFAC can be useful for neural net training. However, they are often memory-inefficient since their preconditioning Kronecker factors are dense, and numerically unstable in low precision as they require matrix inversion or decomposition. These limitations render such meth...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Second-order methods such as KFAC can be useful for neural net training.
However, they are often memory-inefficient since their preconditioning
Kronecker factors are dense, and numerically unstable in low precision as they
require matrix inversion or decomposition. These limitations render such
methods unpopular for modern mixed-precision training. We address them by (i)
formulating an inverse-free KFAC update and (ii) imposing structures in the
Kronecker factors, resulting in structured inverse-free natural gradient
descent (SINGD). On modern neural networks, we show that SINGD is
memory-efficient and numerically robust, in contrast to KFAC, and often
outperforms AdamW even in half precision. Our work closes a gap between first-
and second-order methods in modern low-precision training. |
---|---|
DOI: | 10.48550/arxiv.2312.05705 |