A graph-based estimator of the number of clusters

Assessing the number of clusters of a statistical population is one of the essential issues of unsupervised learning. Given n independent observations X1,...,Xn drawn from an unknown multivariate probability density f, we propose a new approach to estimate the number of connected components, or clus...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Probability and statistics 2007, Vol.11, p.272-280
Hauptverfasser:	Biau, Gérard, Cadre, Benoît, Pelletier, Bruno
Format:	Artikel
Sprache:	eng
Schlagworte:	62G05 62G20 Algorithms Cluster analysis Clustering connected component Data analysis graph level set Mathematics Random variables Statistics Statistics Theory tubular neighborhood
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Assessing the number of clusters of a statistical population is one of the essential issues of unsupervised learning. Given n independent observations X1,...,Xn drawn from an unknown multivariate probability density f, we propose a new approach to estimate the number of connected components, or clusters, of the t-level set $\mathcal L(t)=\{x:f(x) \geq t\}$. The basic idea is to form a rough skeleton of the set $\mathcal L(t)$ using any preliminary estimator of f, and to count the number of connected components of the resulting graph. Under mild analytic conditions on f, and using tools from differential geometry, we establish the consistency of our method.
ISSN:	1292-8100 1262-3318
DOI:	10.1051/ps:2007019