Parameter Estimation for Gibbs Distributions
A central problem in computational statistics is to convert a procedure for sampling combinatorial objects into a procedure for counting those objects, and vice versa. We consider sampling problems coming from Gibbs distributions, which are families of probability distributions over a discrete space...
Gespeichert in:
Veröffentlicht in: | ACM transactions on algorithms 2025-01, Vol.21 (1), p.1-39, Article 3 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A central problem in computational statistics is to convert a procedure for sampling combinatorial objects into a procedure for counting those objects, and vice versa. We consider sampling problems coming from Gibbs distributions, which are families of probability distributions over a discrete space \(\Omega\) with probability mass function of the form \(\mu^{\Omega}_{\beta}(\omega)\propto e^{\beta H(\omega)}\) for \(\beta\) in an interval \([\beta_{\min},\beta_{\max}]\) and \(H(\omega)\in\{0\}\cup[1,n]\) . Two important parameters are the partition function, which is the normalization factor \(Z(\beta)=\sum_{\omega\in\Omega}e^{\beta H(\omega)}\) and the vector of pre-image counts \(c_{x}=|H^{-1}(x)|\) . We develop black-box sampling algorithms to estimate the counts using roughly \(\tilde{O}(\frac{n^{2}}{\varepsilon^{2}})\) samples for integer-valued distributions and \(\tilde{O}(\frac{q}{\varepsilon^{2}})\) samples for general distributions, where \(q=\log\frac{Z(\beta_{\max})}{Z(\beta_{\min})}\) (ignoring some second-order terms and parameters). We show this is optimal up to logarithmic factors. We illustrate with improved algorithms for counting connected subgraphs, independent sets, and perfect matchings. As a key subroutine, we estimate all values of the partition function using \(\tilde{O}(\frac{n^{2}}{\varepsilon^{2}})\) samples for integer-valued distributions and \(\tilde{O}(\frac{q}{\varepsilon^{2}})\) samples for general distributions. This improves over a prior algorithm of Huber (2015) which computes a single point estimate \(Z(\beta_{\max})\) and which uses a slightly larger amount of samples. We show matching lower bounds, demonstrating this complexity is optimal as a function of \(n\) and \(q\) up to logarithmic terms. |
---|---|
ISSN: | 1549-6325 1549-6333 |
DOI: | 10.1145/3685676 |