Implicit High-Order Moment Tensor Estimation and Learning Latent Variable Models
We study the task of learning latent-variable models. An obstacle towards designing efficient algorithms for such models is the necessity of approximating moment tensors of super-constant degree. Motivated by such applications, we develop a general efficient algorithm for implicit moment tensor comp...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We study the task of learning latent-variable models. An obstacle towards
designing efficient algorithms for such models is the necessity of
approximating moment tensors of super-constant degree. Motivated by such
applications, we develop a general efficient algorithm for implicit moment
tensor computation. Our algorithm computes in $\mathrm{poly}(d, k)$ time a
succinct approximate description of tensors of the form
$M_m=\sum_{i=1}^{k}w_iv_i^{\otimes m}$, for $w_i\in\mathbb{R}_+$--even for
$m=\omega(1)$--assuming there exists a polynomial-size arithmetic circuit whose
expected output on an appropriate samplable distribution is equal to $M_m$, and
whose covariance on this input is bounded. Our framework broadly generalizes
the work of~\cite{LL21-opt} which developed an efficient algorithm for the
specific moment tensors that arise in clustering mixtures of spherical
Gaussians.
By leveraging our general algorithm, we obtain the first polynomial-time
learners for the following models.
* Mixtures of Linear Regressions. We give a $\mathrm{poly}(d, k,
1/\epsilon)$-time algorithm for this task. The previously best algorithm has
super-polynomial complexity in $k$.
* Learning Mixtures of Spherical Gaussians. We give a $\mathrm{poly}(d, k,
1/\epsilon)$-time density estimation algorithm, under the condition that the
means lie in a ball of radius $O(\sqrt{\log k})$. Prior algorithms incur
super-polynomial complexity in $k$. We also give a $\mathrm{poly}(d, k,
1/\epsilon)$-time parameter estimation algorithm, under the {\em optimal} mean
separation of $\Omega(\log^{1/2}(k/\epsilon))$.
* PAC Learning Sums of ReLUs. We give a learner with complexity
$\mathrm{poly}(d, k) 2^{\mathrm{poly}(1/\epsilon)}$. This is the first
algorithm for this task that runs in $\mathrm{poly}(d, k)$ time for subconstant
values of $\epsilon = o_{k, d}(1)$. |
---|---|
DOI: | 10.48550/arxiv.2411.15669 |