Network-based dimensionality reduction of high-dimensional, low-sample-size datasets

In the field of data science, there are a variety of datasets that suffer from the high-dimensional, low-sample-size (HDLSS) problem; however, only a few dimensionality reduction methods exist that are applicable to address this type of problem, and there is no nonparametric solution to date. The pu...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Knowledge-based systems 2022-09, Vol.251, p.109180, Article 109180
Hauptverfasser: Kosztyán, Zsolt T., Kurbucz, Marcell T., Katona, Attila I.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In the field of data science, there are a variety of datasets that suffer from the high-dimensional, low-sample-size (HDLSS) problem; however, only a few dimensionality reduction methods exist that are applicable to address this type of problem, and there is no nonparametric solution to date. The purpose of this work is to develop a novel network-based (nonparametric) dimensionality reduction analysis (NDA) method, that can be effectively applied to HDLSS data. First, with the NDA method, the correlation graph of variables is specified. With a modularity-based community detection method, the set of modules is specified. Then, the linear combination of variables weighted by their eigenvector centralities (EVCs), defined as LVs, is determined. In the optional phase of variable selection, variables with low EVCs and low communality are ignored. Then, the set of LVs and the set of indicators belonging to the LVs are specified using the NDA method. NDA is applied to publicly available databases and compared with principal factoring with community analysis (PFA) methods. The results show that NDA can be effectively applied to HDLSS datasets as it outperforms the existing methods in terms of interpretability. In addition, the application of NDA is easier, since there is no need to specify the number of latent variables due to its nonparametric nature. [Display omitted] •A novel network-based nonparametric method (NDA) is proposed to perform dimensionality reduction.•NDA finds latent variables (LVs) by community detection of the correlation graph of indicators.•NDA provides feature selection, ignoring indicators with low and common communalities.•NDA provides both the set of LVs and the set of indicators belonging to the LVs.•NDA is tested and compared with both principal component analysis and factoring analysis on publicly available databases.
ISSN:0950-7051
1872-7409
DOI:10.1016/j.knosys.2022.109180