Detection of Representative Variables in Complex Systems with Interpretable Rules Using Core-Clusters

In this paper, we present a new framework dedicated to the robust detection of representative variables in high dimensional spaces with a potentially limited number of observations. Representative variables are selected by using an original regularization strategy: they are the center of specific va...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Algorithms 2021-02, Vol.14 (2), p.66
Hauptverfasser:	Champion, Camille, Brunet, Anne-Claire, Burcelin, Rémy, Loubes, Jean-Michel, Risser, Laurent
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Clustering Clusters complex data Complex systems Complex variables feature selection graph clustering interpretable machine learning Machine Learning Mathematics Optimization techniques Regularization representative variable detection Statistics Variables
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, we present a new framework dedicated to the robust detection of representative variables in high dimensional spaces with a potentially limited number of observations. Representative variables are selected by using an original regularization strategy: they are the center of specific variable clusters, denoted CORE-clusters, which respect fully interpretable constraints. Each CORE-cluster indeed contains more than a predefined amount of variables and each pair of its variables has a coherent behavior in the observed data. The key advantage of our regularization strategy is therefore that it only requires to tune two intuitive parameters: the minimal dimension of the CORE-clusters and the minimum level of similarity which gathers their variables. Interpreting the role played by a selected representative variable is additionally obvious as it has a similar observed behaviour as a controlled number of other variables. After introducing and justifying this variable selection formalism, we propose two algorithmic strategies to detect the CORE-clusters, one of them scaling particularly well to high-dimensional data. Results obtained on synthetic as well as real data are finally presented.
ISSN:	1999-4893 1999-4893
DOI:	10.3390/a14020066