Compressing Features for Learning With Noisy Labels

Supervised learning can be viewed as distilling relevant information from input data into feature representations. This process becomes difficult when supervision is noisy as the distilled information might not be relevant. In fact, recent research shows that networks can easily overfit all labels i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024-02, Vol.35 (2), p.2124-2138
Hauptverfasser: Chen, Yingyi, Hu, Shell Xu, Shen, Xi, Ai, Chunrong, Suykens, Johan A.K
Format: Artikel
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Supervised learning can be viewed as distilling relevant information from input data into feature representations. This process becomes difficult when supervision is noisy as the distilled information might not be relevant. In fact, recent research shows that networks can easily overfit all labels including those that are corrupted, and hence can hardly generalize to clean datasets. In this article, we focus on the problem of learning with noisy labels and introduce compression inductive bias to network architectures to alleviate this overfitting problem. More precisely, we revisit one classical regularization named Dropout and its variant Nested Dropout. Dropout can serve as a compression constraint for its feature dropping mechanism, while Nested Dropout further learns ordered feature representations with respect to feature importance. Moreover, the trained models with compression regularization are further combined with co-teaching for performance boost. Theoretically, we conduct bias variance decomposition of the objective function under compression regularization. We analyze it for both single model and co-teaching. This decomposition provides three insights: 1) it shows that overfitting is indeed an issue in learning with noisy labels; 2) through an information bottleneck formulation, it explains why the proposed feature compression helps in combating label noise; and 3) it gives explanations on the performance boost brought by incorporating compression regularization into co-teaching. Experiments show that our simple approach can have comparable or even better performance than the state-of-the-art methods on benchmarks with real-world label noise including Clothing1M and ANIMAL-10N. Our implementation is available at https://yingyichen-cyy.github.io/CompressFeatNoisyLabels/.
ISSN:2162-237X