Robust Estimation of Mixture Complexity

In many applications, it is important to find the mixture with fewest number of components, known as the mixture complexity, that provides a satisfactory fit to the data. This article focuses on developing an estimator of mixture complexity that is consistent when the form of component densities are...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of the American Statistical Association 2006-12, Vol.101 (476), p.1475-1486
Hauptverfasser: Woo, Mi-Ja, Sriram, T. N
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In many applications, it is important to find the mixture with fewest number of components, known as the mixture complexity, that provides a satisfactory fit to the data. This article focuses on developing an estimator of mixture complexity that is consistent when the form of component densities are unknown but are postulated to be members of some parametric family and is simultaneously robust against model misspecification. We treat the estimation of mixture complexity as a model selection problem and construct an estimator of mixture complexity as a byproduct of minimizing a Hellinger information criterion. This estimator is shown to be consistent for any parametric family of mixtures. When the model is correctly specified, Monte Carlo simulations for a wide variety of normal mixtures show that our estimator is very competitive with several others in the literature in correctly identifying the true mixture complexity. The basic construction, being firmly rooted in the minimum Hellinger distance approach, enables our estimator to naturally inherit the property of robustness, which is examined, through simulations, under symmetric departures from postulated component normality. In terms of correctly identifying the mixture complexity under model misspecification, our estimator performs much better than an estimator based on the Kullback-Leibler distance due to James, Priebe, and Marchette. An example concerning hypertension is revisited to further illustrate the performance of our estimator.
ISSN:0162-1459
1537-274X
DOI:10.1198/016214506000000555