A Distributed Rough Evidential K-NN Classifier: Integrating Feature Reduction and Classification

The Evidential K -Nearest Neighbor (EK-NN) classification rule provides a global treatment of imperfect knowledge in class labels, but still suffers from the curse of dimensionality as well as runtime and memory restrictions when performing nearest neighbors search, in particular for large and high-...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on fuzzy systems 2021-08, Vol.29 (8), p.2322-2335
Hauptverfasser:	Su, Zhi-gang, Hu, Qinghua, Denoeux, Thierry
Format:	Artikel
Sprache:	eng
Schlagworte:	Apache Spark Artificial Intelligence Big Data Classification Classifiers Cluster computing Computer Science Computer Science, Artificial Intelligence Dempster–Shafer theory Engineering Engineering, Electrical & Electronic Error analysis Feature extraction Feature selection Fuzzy systems generalization error bound (GEB) neighborhood rough set (NRS) model nonparametric classification Nonparametric statistics Rough set models Rough sets Runtime Science & Technology Set theory Spatial data Technology Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The Evidential K -Nearest Neighbor (EK-NN) classification rule provides a global treatment of imperfect knowledge in class labels, but still suffers from the curse of dimensionality as well as runtime and memory restrictions when performing nearest neighbors search, in particular for large and high-dimensional data. To avoid the curse of dimensionality, this article first proposes a rough evidential K -NN (REK-NN) classification rule in the framework of rough set theory. Based on a reformulated K -NN rough set model, REK-NN selects features and thus reduces complexity by minimizing a proposed neighborhood pignistic decision error rate, which considers both Bayes decision error and spatial information among samples in feature space. In contrast to existing rough set-based feature selection methods, REK-NN is a synchronized rule rather than a stepwise one, in the sense that feature selection and learning are performed simultaneously. In order to further handle data with large sample size, we derive a distributed REK-NN method and implement it in the Apache Spark. The theoretical analysis of the classifier generalization error bound is finally presented. It is shown that the distributed REK-NN achieves good performances while drastically reducing the number of features and consuming less runtime and memory. Numerical experiments conducted on real-world datasets validate our conclusions.
ISSN:	1063-6706 1941-0034
DOI:	10.1109/TFUZZ.2020.2998502