Unsupervised feature selection using sparse manifold learning: Auto-encoder approach

Feature selection techniques are widely being used as a preprocessing step to train machine learning algorithms to circumvent the curse of dimensionality, overfitting, and computation time challenges. Projection-based methods are frequently employed in feature selection, leveraging the extraction of...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information processing & management 2025-01, Vol.62 (1), p.103923, Article 103923
Hauptverfasser: Moslemi, Amir, Jamshidi, Mina
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Feature selection techniques are widely being used as a preprocessing step to train machine learning algorithms to circumvent the curse of dimensionality, overfitting, and computation time challenges. Projection-based methods are frequently employed in feature selection, leveraging the extraction of linear relationships among features. The absence of nonlinear information extraction among features is notable in this context. While auto-encoder based techniques have recently gained traction for feature selection, their focus remains primarily on the encoding phase, as it is through this phase that the selected features are derived. The subtle point is that the performance of auto-encoder to obtain the most discriminative features is significantly affected by decoding phase. To address these challenges, in this paper, we proposed a novel feature selection based on auto-encoder to not only extracting nonlinear information among features but also decoding phase is regularized as well to enhance the performance of algorithm. In this study, we defined a new model of auto-encoder to preserve the topological information of reconstructed close to input data. To geometric structure of input data is preserved in projected space using Laplacian graph, and geometrical projected space is preserved in reconstructed space using a suitable term (abstract Laplacian graph of reconstructed data) in optimization problem. Preserving abstract Laplacian graph of reconstructed data close to Laplacian graph of input data affects the performance of feature selection and we experimentally showed this. Therefore, we show an effective approach to solve the objective of the corresponding problem. Since this approach can be mainly used for clustering aims, we conducted experiments on ten benchmark datasets and assessed our propped method based on clustering accuracy and normalized mutual information (NMI) metric. Our method obtained considerable superiority over recent state-of-the-art techniques in terms of NMI and accuracy.
ISSN:0306-4573
DOI:10.1016/j.ipm.2024.103923