A convergence diagnostic for Bayesian clustering

In many applications of Bayesian clustering, posterior sampling on the discrete state space of cluster allocations is achieved via Markov chain Monte Carlo (MCMC) techniques. As it is typically challenging to design transition kernels to explore this state space efficiently, MCMC convergence diagnos...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Wiley interdisciplinary reviews. Computational statistics 2021-07, Vol.13 (4), p.e1536-n/a
Hauptverfasser: Lysy, Martin, Asgharian, Masoud, Partovi Nia, Vahid
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In many applications of Bayesian clustering, posterior sampling on the discrete state space of cluster allocations is achieved via Markov chain Monte Carlo (MCMC) techniques. As it is typically challenging to design transition kernels to explore this state space efficiently, MCMC convergence diagnostics for clustering applications are especially important. Here we propose a diagnostic tool for discrete‐space MCMC, focusing on Bayesian clustering applications where the model parameters have been integrated out. We construct a Hotelling‐type statistic on the highest probability states, and use regenerative sampling theory to derive its equilibrium distribution. By leveraging information from the unnormalized posterior, our diagnostic offers added protection against seemingly convergent chains in which the relative frequency of visited states is incorrect. The methodology is illustrated with a Bayesian clustering analysis of genetic mutants of the flowering plant Arabidopsis thaliana. This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification Statistical Learning and Exploratory Methods of the Data Sciences > Knowledge Discovery Statistical and Graphical Methods of Data Analysis > Markov Chain Monte Carlo Metabolite measurements plots with agglomerative spike‐and‐slab Bayesian clustering dendrogram. We explore the convergence of the Gibbs sampler and split‐merge sampler on the same model.
ISSN:1939-5108
1939-0068
DOI:10.1002/wics.1536