Robust Semi-Supervised Learning With Multi-Consistency and Data Augmentation

In this paper, we address the problem of noisy datasets by proposing a dual screening scheme to improve the performance of models trained on two public noisy datasets: Clothing1M and Animal-10N. As Web crawlers generate both datasets, their label error levels cannot be estimated. We use a warm-up mo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on consumer electronics 2024-02, Vol.70 (1), p.414-424
Hauptverfasser:	Guo, Jing-Ming, Sun, Chi-Chia, Chan, Kuan-Yu, Liu, Chun-Yu
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Algorithms Animal-10N augmentation Clothing1M Consistency Data augmentation Data models Datasets Labels Machine learning multi-consistency Noise measurement noisy labels learning Performance enhancement Robustness Semi-supervised learning Semisupervised learning Task analysis Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, we address the problem of noisy datasets by proposing a dual screening scheme to improve the performance of models trained on two public noisy datasets: Clothing1M and Animal-10N. As Web crawlers generate both datasets, their label error levels cannot be estimated. We use a warm-up model to separate the data into labeled and unlabeled data, which are then classified by multi-model consistency. We select consistent data from the dataset and provide pseudo-labels for training, while the remaining data is not trained as noisy data. This approach reduces the impact of noisy data and mislabeling. To improve the model's robustness, we combine clean data and unlabeled data with strong data augmentation and train them using the Mixup algorithm. Experimental results show that our proposed methods boost classification performance: the accuracy of Clothing1M is 0.1% higher than the state-of-the-art method, and the accuracy of Animal-10N is 2% higher than the state-of-the-art method. The significant contributions of this paper are: 1) adding strong data augmentation to enhance the model, 2) using multi-consistency to reduce the impact of noisy data, and 3) boosting performance through semi-supervised learning.
ISSN:	0098-3063 1558-4127
DOI:	10.1109/TCE.2023.3331700