Stabilized Proximal-Point Methods for Federated Optimization
In developing efficient optimization algorithms, it is crucial to account for communication constraints -- a significant challenge in modern Federated Learning. The best-known communication complexity among non-accelerated algorithms is achieved by DANE, a distributed proximal-point algorithm that s...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In developing efficient optimization algorithms, it is crucial to account for
communication constraints -- a significant challenge in modern Federated
Learning. The best-known communication complexity among non-accelerated
algorithms is achieved by DANE, a distributed proximal-point algorithm that
solves local subproblems at each iteration and that can exploit second-order
similarity among individual functions. However, to achieve such communication
efficiency, the algorithm requires solving local subproblems sufficiently
accurately resulting in slightly sub-optimal local complexity. Inspired by the
hybrid-projection proximal-point method, in this work, we propose a novel
distributed algorithm S-DANE. Compared to DANE, this method uses an auxiliary
sequence of prox-centers while maintaining the same deterministic communication
complexity. Moreover, the accuracy condition for solving the subproblem is
milder, leading to enhanced local computation efficiency. Furthermore, S-DANE
supports partial client participation and arbitrary stochastic local solvers,
making it attractive in practice. We further accelerate S-DANE and show that
the resulting algorithm achieves the best-known communication complexity among
all existing methods for distributed convex optimization while still enjoying
good local computation efficiency as S-DANE. Finally, we propose adaptive
variants of both methods using line search, obtaining the first provably
efficient adaptive algorithms that could exploit local second-order similarity
without the prior knowledge of any parameters. |
---|---|
DOI: | 10.48550/arxiv.2407.07084 |