Analysis of One-Hidden-Layer Neural Networks via the Resolvent Method
In this work, we investigate the asymptotic spectral density of the random feature matrix $M = Y Y^\ast$ with $Y = f(WX)$ generated by a single-hidden-layer neural network, where $W$ and $X$ are random rectangular matrices with i.i.d. centred entries and $f$ is a non-linear smooth function which is...
Gespeichert in:
Veröffentlicht in: | Advances in Neural Information Processing Systems 2021-12 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this work, we investigate the asymptotic spectral density of the random
feature matrix $M = Y Y^\ast$ with $Y = f(WX)$ generated by a
single-hidden-layer neural network, where $W$ and $X$ are random rectangular
matrices with i.i.d. centred entries and $f$ is a non-linear smooth function
which is applied entry-wise. We prove that the Stieltjes transform of the
limiting spectral distribution approximately satisfies a quartic
self-consistent equation, which is exactly the equation obtained by
[Pennington, Worah] and [Benigni, P\'ech\'e] with the moment method. We extend
the previous results to the case of additive bias $Y=f(WX+B)$ with $B$ being an
independent rank-one Gaussian random matrix, closer modelling the neural
network infrastructures encountered in practice. Our key finding is that in the
case of additive bias it is impossible to choose an activation function
preserving the layer-to-layer singular value distribution, in sharp contrast to
the bias-free case where a simple integral constraint is sufficient to achieve
isospectrality. To obtain the asymptotics for the empirical spectral density we
follow the resolvent method from random matrix theory via the cumulant
expansion. We find that this approach is more robust and less combinatorial
than the moment method and expect that it will apply also for models where the
combinatorics of the former become intractable. The resolvent method has been
widely employed, but compared to previous works, it is applied here to
non-linear random matrices. |
---|---|
ISSN: | 1049-5258 |
DOI: | 10.48550/arxiv.2105.05115 |