Parallelization does not Accelerate Convex Optimization: Adaptivity Lower Bounds for Non-smooth Convex Minimization
In this paper we study the limitations of parallelization in convex optimization. A convenient approach to study parallelization is through the prism of \emph{adaptivity} which is an information theoretic measure of the parallel runtime of an algorithm [BS18]. Informally, adaptivity is the number of...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper we study the limitations of parallelization in convex
optimization. A convenient approach to study parallelization is through the
prism of \emph{adaptivity} which is an information theoretic measure of the
parallel runtime of an algorithm [BS18]. Informally, adaptivity is the number
of sequential rounds an algorithm needs to make when it can execute
polynomially-many queries in parallel at every round. For combinatorial
optimization with black-box oracle access, the study of adaptivity has recently
led to exponential accelerations in parallel runtime and the natural question
is whether dramatic accelerations are achievable for convex optimization.
For the problem of minimizing a non-smooth convex function $f:[0,1]^n\to
\mathbb{R}$ over the unit Euclidean ball, we give a tight lower bound that
shows that even when $\texttt{poly}(n)$ queries can be executed in parallel,
there is no randomized algorithm with $\tilde{o}(n^{1/3})$ rounds of adaptivity
that has convergence rate that is better than those achievable with a
one-query-per-round algorithm. A similar lower bound was obtained by Nemirovski
[Nem94], however that result holds for the $\ell_{\infty}$-setting instead of
$\ell_2$. In addition, we also show a tight lower bound that holds for
Lipschitz and strongly convex functions.
At the time of writing this manuscript we were not aware of Nemirovski's
result. The construction we use is similar to the one in [Nem94], though our
analysis is different. Due to the close relationship between this work and
[Nem94], we view the research contribution of this manuscript limited and it
should serve as an instructful approach to understanding lower bounds for
parallel optimization. |
---|---|
DOI: | 10.48550/arxiv.1808.03880 |