Boosting Robust Learning Via Leveraging Reusable Samples in Noisy Web Data

Webly-supervised fine-grained visual classification (FGVC) has attracted increasing attention in recent years because of the unaffordable cost of obtaining correctly-labeled large-scale fine-grained datasets. However, due to the existence of label noise in web images and the high memorization capaci...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on multimedia 2023, Vol.25, p.3284-3295
Hauptverfasser: Sun, Zeren, Yao, Yazhou, Wei, Xiu-Shen, Shen, Fumin, Zhang, Jian, Hua, Xian-Sheng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Webly-supervised fine-grained visual classification (FGVC) has attracted increasing attention in recent years because of the unaffordable cost of obtaining correctly-labeled large-scale fine-grained datasets. However, due to the existence of label noise in web images and the high memorization capacity of deep neural networks, training deep fine-grained (FG) models directly through web images tends to have an inferior recognition ability. In the literature, to alleviate this issue, loss correction methods try to estimate the noise transition matrix, but the inevitable false correction would cause accumulated errors. Sample selection methods identify clean ("easy") samples based on the fact that small losses can alleviate the accumulated errors. However, "hard" and mislabeled examples that can both boost the robustness of FG models are also dropped. To this end, we propose a certainty-based reusable sample selection and correction approach, termed as CRSSC, for coping with label noise in training deep FG models with web images. Our key idea is to additionally identify and correct reusable samples, and then leverage them together with clean examples to update the network. Furthermore, in order to endow our model with the capability to capture richer and more discriminative feature representations, we propose a cross-layer attention-based feature refinement (CLAR) block. We demonstrate the superiority of the proposed approach from both theoretical and experimental perspectives.
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2022.3158001