Tight conditions for when the NTK approximation is valid
We study when the neural tangent kernel (NTK) approximation is valid for training a model with the square loss. In the lazy training setting of Chizat et al. 2019, we show that rescaling the model by a factor of $\alpha = O(T)$ suffices for the NTK approximation to be valid until training time $T$....
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We study when the neural tangent kernel (NTK) approximation is valid for
training a model with the square loss. In the lazy training setting of Chizat
et al. 2019, we show that rescaling the model by a factor of $\alpha = O(T)$
suffices for the NTK approximation to be valid until training time $T$. Our
bound is tight and improves on the previous bound of Chizat et al. 2019, which
required a larger rescaling factor of $\alpha = O(T^2)$. |
---|---|
DOI: | 10.48550/arxiv.2305.13141 |