Black-Box Generalization: Stability of Zeroth-Order Learning
We provide the first generalization error analysis for black-box learning through derivative-free optimization. Under the assumption of a Lipschitz and smooth unknown loss, we consider the Zeroth-order Stochastic Search (ZoSS) algorithm, that updates a $d$-dimensional model by replacing stochastic g...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We provide the first generalization error analysis for black-box learning
through derivative-free optimization. Under the assumption of a Lipschitz and
smooth unknown loss, we consider the Zeroth-order Stochastic Search (ZoSS)
algorithm, that updates a $d$-dimensional model by replacing stochastic
gradient directions with stochastic differences of $K+1$ perturbed loss
evaluations per dataset (example) query. For both unbounded and bounded
possibly nonconvex losses, we present the first generalization bounds for the
ZoSS algorithm. These bounds coincide with those for SGD, and rather
surprisingly are independent of $d$, $K$ and the batch size $m$, under
appropriate choices of a slightly decreased learning rate. For bounded
nonconvex losses and a batch size $m=1$, we additionally show that both
generalization error and learning rate are independent of $d$ and $K$, and
remain essentially the same as for the SGD, even for two function evaluations.
Our results extensively extend and consistently recover established results for
SGD in prior work, on both generalization bounds and corresponding learning
rates. If additionally $m=n$, where $n$ is the dataset size, we derive
generalization guarantees for full-batch GD as well. |
---|---|
DOI: | 10.48550/arxiv.2202.06880 |