A graph-based estimator of the number of clusters

Assessing the number of clusters of a statistical population is one of the essential issues of unsupervised learning. Given n independent observations X1,...,Xn drawn from an unknown multivariate probability density f, we propose a new approach to estimate the number of connected components, or clus...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Probability and statistics 2007, Vol.11, p.272-280
Hauptverfasser: Biau, Gérard, Cadre, Benoît, Pelletier, Bruno
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Assessing the number of clusters of a statistical population is one of the essential issues of unsupervised learning. Given n independent observations X1,...,Xn drawn from an unknown multivariate probability density f, we propose a new approach to estimate the number of connected components, or clusters, of the t-level set $\mathcal L(t)=\{x:f(x) \geq t\}$. The basic idea is to form a rough skeleton of the set $\mathcal L(t)$ using any preliminary estimator of f, and to count the number of connected components of the resulting graph. Under mild analytic conditions on f, and using tools from differential geometry, we establish the consistency of our method.
ISSN:1292-8100
1262-3318
DOI:10.1051/ps:2007019