Variable Selection for Model-Based Clustering

We consider the problem of variable or feature selection for model-based clustering. The problem of comparing two nested subsets of variables is recast as a model comparison problem and addressed using approximate Bayes factors. A greedy search algorithm is proposed for finding a local optimum in mo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of the American Statistical Association 2006-03, Vol.101 (473), p.168-178
Hauptverfasser:	Raftery, Adrian E, Dean, Nema
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Applications Bayes factor BIC Cluster analysis Exact sciences and technology Feature selection General topics Mathematics Model-based clustering Multivariate analysis Parametric inference Probability and statistics Sciences and techniques of general use Statistics Unsupervised learning Variable selection Variables
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We consider the problem of variable or feature selection for model-based clustering. The problem of comparing two nested subsets of variables is recast as a model comparison problem and addressed using approximate Bayes factors. A greedy search algorithm is proposed for finding a local optimum in model space. The resulting method selects variables (or features), the number of clusters, and the clustering model simultaneously. We applied the method to several simulated and real examples and found that removing irrelevant variables often improved performance. Compared with methods based on all of the variables, our variable selection method consistently yielded more accurate estimates of the number of groups and lower classification error rates, as well as more parsimonious clustering models and easier visualization of results.
ISSN:	0162-1459 1537-274X
DOI:	10.1198/016214506000000113