Multi-cluster nonlinear unsupervised feature selection via joint manifold learning and generalized Lasso

Due to the scarcity of data labels, unsupervised feature selection has received a lot of attention in recent years. While many unsupervised feature selection methods are capable of selecting relevant features, they often fail to comprehensively consider the impact of both local and global informatio...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Expert systems with applications 2024-12, Vol.255, p.124502, Article 124502
Hauptverfasser:	Wang, Yadi, Huang, Mengyao, Zhou, Liming, Che, Hangjun, Jiang, Bingbing
Format:	Artikel
Sprache:	eng
Schlagworte:	Hilbert–Schmidt independence criterion Manifold learning Non-linear Sparse learning Unsupervised feature selection
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Due to the scarcity of data labels, unsupervised feature selection has received a lot of attention in recent years. While many unsupervised feature selection methods are capable of selecting relevant features, they often fail to comprehensively consider the impact of both local and global information of the data on feature selection, nor can they effectively handle the complex nonlinear relationships commonly found in real-world data. As a result, suboptimal feature subsets are often selected. In this paper, inspired by the Uniform Manifold Approximation and Projection (UMAP) manifold learning technique and the nonlinear sparse learning method based on Feature-Wise Kernelized Lasso, we propose a novel unsupervised feature selection method called Multi-Cluster Unsupervised Nonlinear Feature Selection based on UMAP and block HSIC Lasso (MUNFS). MUNFS greatly improves the representation of high-dimensional data during dimensionality reduction and effectively handles complex nonlinear relationships in such data. Specifically, by capturing the intrinsic topology of the data, MUNFS accurately preserves the local structure of the data while keeping as much of the global structure as possible. Furthermore, the kernel-based Hilbert–Schmidt Independence Criterion (HSIC) may measure the nonlinear dependency between the features and the target variables, while applying the l1 regularization term in feature selection to achieve sparsity. This allows for a more precise assessment of the significance of each feature. Extensive experimental results on five benchmark datasets and eight hyperspectral datasets demonstrate that the MUNFS method performs much better than several other feature selection methods. •A novel method named MUNFS is proposed to remove irrelevant and redundant features.•MUNFS solves the imbalance between local and global structure retention of data.•MUNFS solves the issue of inadequate treatment of data nonlinear relationships.•The importance of each feature can be evaluated more accurately.•The superior performance of the MUNFS method is verified by numerous experiments.
ISSN:	0957-4174 1873-6793
DOI:	10.1016/j.eswa.2024.124502