Kick-one-out-based variable selection method for Euclidean distance-based classifier in high-dimensional settings

This paper presents a variable selection method for the Euclidean distance-based classifier in high-dimensional settings. We are concerned that the expected probabilities of misclassification (EPMC) for the Euclidean distance-based classifier may be increasing with dimension when redundant variables...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of multivariate analysis 2021-07, Vol.184, p.104756, Article 104756
Hauptverfasser: Nakagawa, Tomoyuki, Watanabe, Hiroki, Hyodo, Masashi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper presents a variable selection method for the Euclidean distance-based classifier in high-dimensional settings. We are concerned that the expected probabilities of misclassification (EPMC) for the Euclidean distance-based classifier may be increasing with dimension when redundant variables are included in feature values. First, we show the Euclidean distance-based classifier with only non-redundant variables reduces asymptotic EPMC more than the Euclidean distance-based classifier with all variables. Next, we obtain a kick-one-out based variable selection method that helps reduce EPMC and prove its consistency in variable selection in the context of high dimensionality. Finally, we conduct a Monte Carlo simulation study to examine the finite sample performance of the proposed selection method. Our simulation results show that the selection method frequently selects the set containing non-redundant variables. We also observed that the discrimination rules constructed from the selected variables reduce EPMC more than the discrimination rules constructed from all variables.
ISSN:0047-259X
1095-7243
DOI:10.1016/j.jmva.2021.104756