AutoGMM: Automatic and Hierarchical Gaussian Mixture Modeling in Python
Background: Gaussian mixture modeling is a fundamental tool in clustering, as well as discriminant analysis and semiparametric density estimation. However, estimating the optimal model for any given number of components is an NP-hard problem, and estimating the number of components is in some respec...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Background: Gaussian mixture modeling is a fundamental tool in clustering, as
well as discriminant analysis and semiparametric density estimation. However,
estimating the optimal model for any given number of components is an NP-hard
problem, and estimating the number of components is in some respects an even
harder problem. Findings: In R, a popular package called mclust addresses both
of these problems. However, Python has lacked such a package. We therefore
introduce AutoGMM, a Python algorithm for automatic Gaussian mixture modeling,
and its hierarchical version, HGMM. AutoGMM builds upon scikit-learn's
AgglomerativeClustering and GaussianMixture classes, with certain modifications
to make the results more stable. Empirically, on several different
applications, AutoGMM performs approximately as well as mclust, and sometimes
better. Conclusions: AutoMM, a freely available Python package, enables
efficient Gaussian mixture modeling by automatically selecting the
initialization, number of clusters and covariance constraints. |
---|---|
DOI: | 10.48550/arxiv.1909.02688 |