Learning From Long-Tailed Data With Noisy Labels
Class imbalance and noisy labels are the norm rather than the exception in many large-scale classification datasets. Nevertheless, most works in machine learning typically assume balanced and clean data. There have been some recent attempts to tackle, on one side, the problem of learning from noisy...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Class imbalance and noisy labels are the norm rather than the exception in
many large-scale classification datasets. Nevertheless, most works in machine
learning typically assume balanced and clean data. There have been some recent
attempts to tackle, on one side, the problem of learning from noisy labels and,
on the other side, learning from long-tailed data. Each group of methods make
simplifying assumptions about the other. Due to this separation, the proposed
solutions often underperform when both assumptions are violated. In this work,
we present a simple two-stage approach based on recent advances in
self-supervised learning to treat both challenges simultaneously. It consists
of, first, task-agnostic self-supervised pre-training, followed by
task-specific fine-tuning using an appropriate loss. Most significantly, we
find that self-supervised learning approaches are effectively able to cope with
severe class imbalance. In addition, the resulting learned representations are
also remarkably robust to label noise, when fine-tuned with an imbalance- and
noise-resistant loss function. We validate our claims with experiments on
CIFAR-10 and CIFAR-100 augmented with synthetic imbalance and noise, as well as
the large-scale inherently noisy Clothing-1M dataset. |
---|---|
DOI: | 10.48550/arxiv.2108.11096 |