Parallelization does not Accelerate Convex Optimization: Adaptivity Lower Bounds for Non-smooth Convex Minimization

In this paper we study the limitations of parallelization in convex optimization. A convenient approach to study parallelization is through the prism of \emph{adaptivity} which is an information theoretic measure of the parallel runtime of an algorithm [BS18]. Informally, adaptivity is the number of...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Balkanski, Eric, Singer, Yaron
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this paper we study the limitations of parallelization in convex optimization. A convenient approach to study parallelization is through the prism of \emph{adaptivity} which is an information theoretic measure of the parallel runtime of an algorithm [BS18]. Informally, adaptivity is the number of sequential rounds an algorithm needs to make when it can execute polynomially-many queries in parallel at every round. For combinatorial optimization with black-box oracle access, the study of adaptivity has recently led to exponential accelerations in parallel runtime and the natural question is whether dramatic accelerations are achievable for convex optimization. For the problem of minimizing a non-smooth convex function $f:[0,1]^n\to \mathbb{R}$ over the unit Euclidean ball, we give a tight lower bound that shows that even when $\texttt{poly}(n)$ queries can be executed in parallel, there is no randomized algorithm with $\tilde{o}(n^{1/3})$ rounds of adaptivity that has convergence rate that is better than those achievable with a one-query-per-round algorithm. A similar lower bound was obtained by Nemirovski [Nem94], however that result holds for the $\ell_{\infty}$-setting instead of $\ell_2$. In addition, we also show a tight lower bound that holds for Lipschitz and strongly convex functions. At the time of writing this manuscript we were not aware of Nemirovski's result. The construction we use is similar to the one in [Nem94], though our analysis is different. Due to the close relationship between this work and [Nem94], we view the research contribution of this manuscript limited and it should serve as an instructful approach to understanding lower bounds for parallel optimization.
DOI:10.48550/arxiv.1808.03880