Parameter Estimation for Gibbs Distributions

A central problem in computational statistics is to convert a procedure for sampling combinatorial objects into a procedure for counting those objects, and vice versa. We consider sampling problems coming from Gibbs distributions, which are families of probability distributions over a discrete space...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on algorithms 2025-01, Vol.21 (1), p.1-39, Article 3
Hauptverfasser: Harris, David G., Kolmogorov, Vladimir
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A central problem in computational statistics is to convert a procedure for sampling combinatorial objects into a procedure for counting those objects, and vice versa. We consider sampling problems coming from Gibbs distributions, which are families of probability distributions over a discrete space \(\Omega\) with probability mass function of the form \(\mu^{\Omega}_{\beta}(\omega)\propto e^{\beta H(\omega)}\) for \(\beta\) in an interval \([\beta_{\min},\beta_{\max}]\) and \(H(\omega)\in\{0\}\cup[1,n]\) . Two important parameters are the partition function, which is the normalization factor \(Z(\beta)=\sum_{\omega\in\Omega}e^{\beta H(\omega)}\) and the vector of pre-image counts \(c_{x}=|H^{-1}(x)|\) . We develop black-box sampling algorithms to estimate the counts using roughly \(\tilde{O}(\frac{n^{2}}{\varepsilon^{2}})\) samples for integer-valued distributions and \(\tilde{O}(\frac{q}{\varepsilon^{2}})\) samples for general distributions, where \(q=\log\frac{Z(\beta_{\max})}{Z(\beta_{\min})}\) (ignoring some second-order terms and parameters). We show this is optimal up to logarithmic factors. We illustrate with improved algorithms for counting connected subgraphs, independent sets, and perfect matchings. As a key subroutine, we estimate all values of the partition function using \(\tilde{O}(\frac{n^{2}}{\varepsilon^{2}})\) samples for integer-valued distributions and \(\tilde{O}(\frac{q}{\varepsilon^{2}})\) samples for general distributions. This improves over a prior algorithm of Huber (2015) which computes a single point estimate \(Z(\beta_{\max})\) and which uses a slightly larger amount of samples. We show matching lower bounds, demonstrating this complexity is optimal as a function of \(n\) and \(q\) up to logarithmic terms.
ISSN:1549-6325
1549-6333
DOI:10.1145/3685676