Neural Collapse for Unconstrained Feature Model under Cross-entropy Loss with Imbalanced Data
Recent years have witnessed the huge success of deep neural networks (DNNs) in various tasks of computer vision and text processing. Interestingly, these DNNs with massive number of parameters share similar structural properties on their feature representation and last-layer classifier at terminal p...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recent years have witnessed the huge success of deep neural networks (DNNs)
in various tasks of computer vision and text processing. Interestingly, these
DNNs with massive number of parameters share similar structural properties on
their feature representation and last-layer classifier at terminal phase of
training (TPT). Specifically, if the training data are balanced (each class
shares the same number of samples), it is observed that the feature vectors of
samples from the same class converge to their corresponding in-class mean
features and their pairwise angles are the same. This fascinating phenomenon is
known as Neural Collapse (N C), first termed by Papyan, Han, and Donoho in
2019. Many recent works manage to theoretically explain this phenomenon by
adopting so-called unconstrained feature model (UFM). In this paper, we study
the extension of N C phenomenon to the imbalanced data under cross-entropy loss
function in the context of unconstrained feature model. Our contribution is
multi-fold compared with the state-of-the-art results: (a) we show that the
feature vectors exhibit collapse phenomenon, i.e., the features within the same
class collapse to the same mean vector; (b) the mean feature vectors no longer
form an equiangular tight frame. Instead, their pairwise angles depend on the
sample size; (c) we also precisely characterize the sharp threshold on which
the minority collapse (the feature vectors of the minority groups collapse to
one single vector) will take place; (d) finally, we argue that the effect of
the imbalance in datasize diminishes as the sample size grows. Our results
provide a complete picture of the N C under the cross-entropy loss for the
imbalanced data. Numerical experiments confirm our theoretical analysis. |
---|---|
DOI: | 10.48550/arxiv.2309.09725 |