Kernel and Rich Regimes in Overparametrized Models
A recent line of work studies overparametrized neural networks in the "kernel regime," i.e. when the network behaves during training as a kernelized linear predictor, and thus training with gradient descent has the effect of finding the minimum RKHS norm solution. This stands in contrast t...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A recent line of work studies overparametrized neural networks in the "kernel
regime," i.e. when the network behaves during training as a kernelized linear
predictor, and thus training with gradient descent has the effect of finding
the minimum RKHS norm solution. This stands in contrast to other studies which
demonstrate how gradient descent on overparametrized multilayer networks can
induce rich implicit biases that are not RKHS norms. Building on an observation
by Chizat and Bach, we show how the scale of the initialization controls the
transition between the "kernel" (aka lazy) and "rich" (aka active) regimes and
affects generalization properties in multilayer homogeneous models. We provide
a complete and detailed analysis for a simple two-layer model that already
exhibits an interesting and meaningful transition between the kernel and rich
regimes, and we demonstrate the transition for more complex matrix
factorization models and multilayer non-linear networks. |
---|---|
DOI: | 10.48550/arxiv.1906.05827 |