A Training Data Set Cleaning Method by Classification Ability Ranking for the [Formula Omitted]-Nearest Neighbor Classifier

The [Formula Omitted]-nearest neighbor (KNN) rule is a successful technique in pattern classification due to its simplicity and effectiveness. As a supervised classifier, KNN classification performance usually suffers from low-quality samples in the training data set. Thus, training data set cleanin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transaction on neural networks and learning systems 2020-01, Vol.31 (5), p.1544
Hauptverfasser: Wang, Yidi, Pan, Zhibin, Pan, Yiwei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The [Formula Omitted]-nearest neighbor (KNN) rule is a successful technique in pattern classification due to its simplicity and effectiveness. As a supervised classifier, KNN classification performance usually suffers from low-quality samples in the training data set. Thus, training data set cleaning (TDC) methods are needed for enhancing the classification accuracy by cleaning out noisy, or even wrong, samples in the original training data set. In this paper, we propose a classification ability ranking (CAR)-based TDC method to improve the performance of a KNN classifier, namely CAR-based TDC method. The proposed classification ability function ranks a training sample in terms of its contribution to correctly classify other training samples as a KNN through the leave-one-out (LV1) strategy in the cleaning stage. The training sample that likely misclassifies the other samples during the KNN classifications according to the LV1 strategy is considered to have lower classification ability and will be cleaned out from the original training data set. Extensive experiments, based on ten real-world data sets, show that the proposed CAR-based TDC method can significantly reduce the classification error rates of KNN-based classifiers, while reducing computational complexity thanks to a smaller cleaned training data set.
ISSN:2162-237X
2162-2388
DOI:10.1109/TNNLS.2019.2920864