Zeroth-order Optimization with Weak Dimension Dependency
Zeroth-order optimization is a fundamental research topic that has been a focus of various learning tasks, such as black-box adversarial attacks, bandits, and reinforcement learning. However, in theory, most complexity results assert a linear dependency on the dimension of optimization variable, whi...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Zeroth-order optimization is a fundamental research topic that has been a
focus of various learning tasks, such as black-box adversarial attacks,
bandits, and reinforcement learning. However, in theory, most complexity
results assert a linear dependency on the dimension of optimization variable,
which implies paralyzations of zeroth-order algorithms for high-dimensional
problems and cannot explain their effectiveness in practice. In this paper, we
present a novel zeroth-order optimization theory characterized by complexities
that exhibit weak dependencies on dimensionality. The key contribution lies in
the introduction of a new factor, denoted as $\mathrm{ED}_{\alpha}=\sup_{x\in
\mathbb{R}^d}\sum_{i=1}^d\sigma_i^\alpha(\nabla^2 f(x))$ ($\alpha>0$,
$\sigma_i(\cdot)$ is the $i$-th singular value in non-increasing order), which
effectively functions as a measure of dimensionality. The algorithms we propose
demonstrate significantly reduced complexities when measured in terms of the
factor $\mathrm{ED}_{\alpha}$. Specifically, we first study a well-known
zeroth-order algorithm from Nesterov and Spokoiny (2017) on quadratic
objectives and show a complexity of
$\mathcal{O}\left(\frac{\mathrm{ED}_1}{\sigma_d}\log(1/\epsilon)\right)$ for
the strongly convex setting. Furthermore, we introduce novel algorithms that
leverages the Heavy-ball mechanism. Our proposed algorithm exhibits a
complexity of
$\mathcal{O}\left(\frac{\mathrm{ED}_{1/2}}{\sqrt{\sigma_d}}\cdot\log{\frac{L}{\mu}}\cdot\log(1/\epsilon)\right)$.
We further expand the scope of the method to encompass generic smooth
optimization problems under an additional Hessian-smooth condition. The
resultant algorithms demonstrate remarkable complexities which improve by an
order in $d$ under appropriate conditions. Our analysis lays the foundation for
zeroth-order optimization methods for smooth functions within high-dimensional
settings. |
---|---|
DOI: | 10.48550/arxiv.2307.05753 |