Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice. Recently, to alleviate expensive data collection, co-occurring pairs from the Internet are automatically harvested for training. However, it inevitably includes mismatched pairs, \ie, noisy corresponden...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Cross-modal retrieval relies on well-matched large-scale datasets that are
laborious in practice. Recently, to alleviate expensive data collection,
co-occurring pairs from the Internet are automatically harvested for training.
However, it inevitably includes mismatched pairs, \ie, noisy correspondences,
undermining supervision reliability and degrading performance. Current methods
leverage deep neural networks' memorization effect to address noisy
correspondences, which overconfidently focus on \emph{similarity-guided
training with hard negatives} and suffer from self-reinforcing errors. In light
of above, we introduce a novel noisy correspondence learning framework, namely
\textbf{S}elf-\textbf{R}einforcing \textbf{E}rrors \textbf{M}itigation (SREM).
Specifically, by viewing sample matching as classification tasks within the
batch, we generate classification logits for the given sample. Instead of a
single similarity score, we refine sample filtration through energy uncertainty
and estimate model's sensitivity of selected clean samples using swapped
classification entropy, in view of the overall prediction distribution.
Additionally, we propose cross-modal biased complementary learning to leverage
negative matches overlooked in hard-negative training, further improving model
optimization stability and curbing self-reinforcing errors. Extensive
experiments on challenging benchmarks affirm the efficacy and efficiency of
SREM. |
---|---|
DOI: | 10.48550/arxiv.2312.16478 |