Optimization with Access to Auxiliary Information
We investigate the fundamental optimization question of minimizing a target function $f$, whose gradients are expensive to compute or have limited availability, given access to some auxiliary side function $h$ whose gradients are cheap or more available. This formulation captures many settings of pr...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We investigate the fundamental optimization question of minimizing a target
function $f$, whose gradients are expensive to compute or have limited
availability, given access to some auxiliary side function $h$ whose gradients
are cheap or more available. This formulation captures many settings of
practical relevance, such as i) re-using batches in SGD, ii) transfer learning,
iii) federated learning, iv) training with compressed models/dropout, Et
cetera. We propose two generic new algorithms that apply in all these settings;
we also prove that we can benefit from this framework under the Hessian
similarity assumption between the target and side information. A benefit is
obtained when this similarity measure is small; we also show a potential
benefit from stochasticity when the auxiliary noise is correlated with that of
the target function. |
---|---|
DOI: | 10.48550/arxiv.2206.00395 |