A Distributed Rough Evidential K-NN Classifier: Integrating Feature Reduction and Classification

The Evidential K -Nearest Neighbor (EK-NN) classification rule provides a global treatment of imperfect knowledge in class labels, but still suffers from the curse of dimensionality as well as runtime and memory restrictions when performing nearest neighbors search, in particular for large and high-...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on fuzzy systems 2021-08, Vol.29 (8), p.2322-2335
Hauptverfasser: Su, Zhi-gang, Hu, Qinghua, Denoeux, Thierry
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The Evidential K -Nearest Neighbor (EK-NN) classification rule provides a global treatment of imperfect knowledge in class labels, but still suffers from the curse of dimensionality as well as runtime and memory restrictions when performing nearest neighbors search, in particular for large and high-dimensional data. To avoid the curse of dimensionality, this article first proposes a rough evidential K -NN (REK-NN) classification rule in the framework of rough set theory. Based on a reformulated K -NN rough set model, REK-NN selects features and thus reduces complexity by minimizing a proposed neighborhood pignistic decision error rate, which considers both Bayes decision error and spatial information among samples in feature space. In contrast to existing rough set-based feature selection methods, REK-NN is a synchronized rule rather than a stepwise one, in the sense that feature selection and learning are performed simultaneously. In order to further handle data with large sample size, we derive a distributed REK-NN method and implement it in the Apache Spark. The theoretical analysis of the classifier generalization error bound is finally presented. It is shown that the distributed REK-NN achieves good performances while drastically reducing the number of features and consuming less runtime and memory. Numerical experiments conducted on real-world datasets validate our conclusions.
ISSN:1063-6706
1941-0034
DOI:10.1109/TFUZZ.2020.2998502