Unsupervised Feature Selection and Clustering Optimization Based on Improved Differential Evolution

The feature selection method based on supervised learning has been widely studied and applied to the field of machine learning and data mining. But unsupervised feature selection is still a tricky area of research because the unavailability of the label information, especially for clustering tasks....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2019, Vol.7, p.140438-140450
Hauptverfasser: Li, Tao, Dong, Hongbin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The feature selection method based on supervised learning has been widely studied and applied to the field of machine learning and data mining. But unsupervised feature selection is still a tricky area of research because the unavailability of the label information, especially for clustering tasks. Irrelevant features and redundant features in the original data seriously block the discovery of clustering structure and weaken the performance of the subsequent classification. In order to address this problem, the unsupervised feature selection and clustering algorithm based on the evolutionary computing framework is proposed in this paper. First, the binary differential evolution algorithm is constructed for unsupervised feature selection. Specifically, the individuals of the population are used to characterize the feature subspaces and the improved Laplacian model is designed to measure the local manifold structure of each individual. Subsequently, the approximate optimal manifold structure and the corresponding feature subset are obtained. Then, the continuous differential evolutionary algorithm is executed on the optimized feature subset, in which the individual representation strategy and the integrated individual measure function are designed for clustering. Moreover, the predicted pseudo-labels are utilized to classify and further verify the validity of clustering. The experimental results demonstrate that the proposed framework outperforms the most state-of-the-art methods.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2019.2937739