Tight conditions for when the NTK approximation is valid
We study when the neural tangent kernel (NTK) approximation is valid for training a model with the square loss. In the lazy training setting of Chizat et al. 2019, we show that rescaling the model by a factor of \(\alpha = O(T)\) suffices for the NTK approximation to be valid until training time \(T...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2023-11 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We study when the neural tangent kernel (NTK) approximation is valid for training a model with the square loss. In the lazy training setting of Chizat et al. 2019, we show that rescaling the model by a factor of \(\alpha = O(T)\) suffices for the NTK approximation to be valid until training time \(T\). Our bound is tight and improves on the previous bound of Chizat et al. 2019, which required a larger rescaling factor of \(\alpha = O(T^2)\). |
---|---|
ISSN: | 2331-8422 |