Neural Networks Learn Statistics of Increasing Complexity
The distributional simplicity bias (DSB) posits that neural networks learn low-order moments of the data distribution first, before moving on to higher-order correlations. In this work, we present compelling new evidence for the DSB by showing that networks automatically learn to perform well on max...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The distributional simplicity bias (DSB) posits that neural networks learn
low-order moments of the data distribution first, before moving on to
higher-order correlations. In this work, we present compelling new evidence for
the DSB by showing that networks automatically learn to perform well on
maximum-entropy distributions whose low-order statistics match those of the
training set early in training, then lose this ability later. We also extend
the DSB to discrete domains by proving an equivalence between token $n$-gram
frequencies and the moments of embedding vectors, and by finding empirical
evidence for the bias in LLMs. Finally we use optimal transport methods to
surgically edit the low-order statistics of one class to match those of
another, and show that early-training networks treat the edited samples as if
they were drawn from the target class. Code is available at
https://github.com/EleutherAI/features-across-time. |
---|---|
DOI: | 10.48550/arxiv.2402.04362 |