CEDAS: A Compressed Decentralized Stochastic Gradient Method with Improved Convergence

In this paper, we consider solving the distributed optimization problem over a multi-agent network under the communication restricted setting. We study a compressed decentralized stochastic gradient method, termed "compressed exact diffusion with adaptive stepsizes (CEDAS)", and show the m...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on automatic control 2024-09, p.1-16
Hauptverfasser:	Huang, Kun, Pu, Shi
Format:	Artikel
Sprache:	eng
Schlagworte:	compressed gradient methods Compressors Convergence convex optimization distributed optimization Gradient methods Lead Linear programming nonconvex optimization Optimization Radio frequency Signal to noise ratio stochastic gradient methods Stochastic processes Transient analysis
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, we consider solving the distributed optimization problem over a multi-agent network under the communication restricted setting. We study a compressed decentralized stochastic gradient method, termed "compressed exact diffusion with adaptive stepsizes (CEDAS)", and show the method asymptotically achieves comparable convergence rate as centralized stochastic gradient descent (SGD) for both smooth strongly convex objective functions and smooth nonconvex objective functions under unbiased compression operators. In particular, to our knowledge, CEDAS enjoys so far the shortest transient time (with respect to the graph specifics) for achieving the convergence rate of centralized SGD, which behaves as \mathcal {O}({\rm n}{{\rm C}^{3}}/(1-\lambda _{2})^{2}) under smooth strongly convex objective functions, and \mathcal {O}({\rm n}^{3}{{\rm C}^{6}}/(1-\lambda _{2})^{4}) under smooth nonconvex objective functions, where (1-\lambda _{2}) denotes the spectral gap of the mixing matrix, and C>0 is the compression-related parameter. In particular, CEDAS exhibits the shortest transient times when C < \mathcal {O}(1/(1 - \lambda _{2})^{2}), which is common in practice. Numerical experiments further demonstrate the effectiveness of the proposed algorithm.
ISSN:	0018-9286 1558-2523
DOI:	10.1109/TAC.2024.3471854