Statistical and Computational Trade-offs in Variational Inference: A Case Study in Inferential Model Selection
Variational inference has recently emerged as a popular alternative to the classical Markov chain Monte Carlo (MCMC) in large-scale Bayesian inference. The core idea is to trade statistical accuracy for computational efficiency. In this work, we study these statistical and computational trade-offs i...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Variational inference has recently emerged as a popular alternative to the
classical Markov chain Monte Carlo (MCMC) in large-scale Bayesian inference.
The core idea is to trade statistical accuracy for computational efficiency. In
this work, we study these statistical and computational trade-offs in
variational inference via a case study in inferential model selection. Focusing
on Gaussian inferential models (or variational approximating families) with
diagonal plus low-rank precision matrices, we initiate a theoretical study of
the trade-offs in two aspects, Bayesian posterior inference error and
frequentist uncertainty quantification error. From the Bayesian posterior
inference perspective, we characterize the error of the variational posterior
relative to the exact posterior. We prove that, given a fixed computation
budget, a lower-rank inferential model produces variational posteriors with a
higher statistical approximation error, but a lower computational error; it
reduces variance in stochastic optimization and, in turn, accelerates
convergence. From the frequentist uncertainty quantification perspective, we
consider the precision matrix of the variational posterior as an uncertainty
estimate, which involves an additional statistical error originating from the
sampling uncertainty of the data. As a consequence, for small datasets, the
inferential model need not be full-rank to achieve optimal estimation error
(even with unlimited computation budget). |
---|---|
DOI: | 10.48550/arxiv.2207.11208 |