A Penalty-Based Method for Communication-Efficient Decentralized Bilevel Programming
Bilevel programming has recently received attention in the literature due to its wide range of applications, including reinforcement learning and hyper-parameter optimization. However, it is widely assumed that the underlying bilevel optimization problem is solved either by a single machine or, in t...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Bilevel programming has recently received attention in the literature due to
its wide range of applications, including reinforcement learning and
hyper-parameter optimization. However, it is widely assumed that the underlying
bilevel optimization problem is solved either by a single machine or, in the
case of multiple machines connected in a star-shaped network, i.e., in a
federated learning setting. The latter approach suffers from a high
communication cost on the central node (e.g., parameter server). Hence, there
is an interest in developing methods that solve bilevel optimization problems
in a communication-efficient, decentralized manner. To that end, this paper
introduces a penalty function-based decentralized algorithm with theoretical
guarantees for this class of optimization problems. Specifically, a distributed
alternating gradient-type algorithm for solving consensus bilevel programming
over a decentralized network is developed. A key feature of the proposed
algorithm is the estimation of the hyper-gradient of the penalty function
through decentralized computation of matrix-vector products and a few vector
communications. The estimation is integrated into an alternating algorithm for
solving the penalized reformulation of the bilevel optimization problem. Under
appropriate step sizes and penalty parameters, our theoretical framework
ensures non-asymptotic convergence to the optimal solution of the original
problem under various convexity conditions. Our theoretical result highlights
improvements in the iteration complexity of decentralized bilevel optimization,
all while making efficient use of vector communication. Empirical results
demonstrate that the proposed method performs well in real-world settings. |
---|---|
DOI: | 10.48550/arxiv.2211.04088 |