mCRF and mRD: Two Classification Methods Based on a Novel Multiclass Label Noise Filtering Learning Framework

Mitigating label noise is a crucial problem in classification. Noise filtering is an effective method of dealing with label noise which does not need to estimate the noise rate or rely on any loss function. However, most filtering methods focus mainly on binary classification, leaving the more diffi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transaction on neural networks and learning systems 2022-07, Vol.PP (7), p.1-15
Hauptverfasser: Xia, Shuyin, Chen, Baiyun, Wang, Guoyin, Zheng, Yong, Gao, Xinbo, Giem, Elisabeth, Chen, Zizhong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Mitigating label noise is a crucial problem in classification. Noise filtering is an effective method of dealing with label noise which does not need to estimate the noise rate or rely on any loss function. However, most filtering methods focus mainly on binary classification, leaving the more difficult counterpart problem of multiclass classification relatively unexplored. To remedy this deficit, we present a definition for label noise in a multiclass setting and propose a general framework for a novel label noise filtering learning method for multiclass classification. Two examples of noise filtering methods for multiclass classification, multiclass complete random forest (mCRF) and multiclass relative density, are derived from their binary counterparts using our proposed framework. In addition, to optimize the NI_threshold hyperparameter in mCRF, we propose two new optimization methods: a new voting cross-validation method and an adaptive method that employs a 2-means clustering algorithm. Furthermore, we incorporate SMOTE into our label noise filtering learning framework to handle the ubiquitous problem of imbalanced data in multiclass classification. We report experiments on both synthetic data sets and UCI benchmarks to demonstrate our proposed methods are highly robust to label noise in comparison with state-of-the-art baselines. All code and data results are available at https://github.com/syxiaa/Multiclass-Label-Noise-Filtering-Learning.
ISSN:2162-237X
2162-2388
DOI:10.1109/TNNLS.2020.3047046