Model-Based Clustering, Discriminant Analysis, and Density Estimation

Cluster analysis is the automated search for groups of related observations in a dataset. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures, and most clustering methods available in commercial software are also of this type. However, there is little...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of the American Statistical Association 2002-06, Vol.97 (458), p.611-631
Hauptverfasser:	Fraley, Chris, Raftery, Adrian E
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Bayes factor Bayesian method Breast cancer Breast cancer diagnosis Cluster analysis Covariance matrices Datasets Density estimation Differential analysis Discriminant analysis EM algorithm Estimation Gene expression Gene expression microarray data Information classification Markov analysis Markov chain Monte Carlo Methodology Mixture model Modeling Multilevel models Outliers Parametric models Review Papers Software Spatial point process Statistical analysis Statistical methods Statistics
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Cluster analysis is the automated search for groups of related observations in a dataset. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures, and most clustering methods available in commercial software are also of this type. However, there is little systematic guidance associated with these methods for solving important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled. We review a general methodology for model-based clustering that provides a principled statistical approach to these issues. We also show that this can be useful for other problems in multivariate analysis, such as discriminant analysis and multivariate density estimation. We give examples from medical diagnosis, minefield detection, cluster recovery from noisy data, and spatial density estimation. Finally, we mention limitations of the methodology and discuss recent developments in model-based clustering for non-Gaussian data, high-dimensional datasets, large datasets, and Bayesian estimation.
ISSN:	0162-1459 1537-274X
DOI:	10.1198/016214502760047131