Dualize, Split, Randomize: Toward Fast Nonsmooth Optimization Algorithms

We consider minimizing the sum of three convex functions, where the first one F is smooth, the second one is nonsmooth and proximable and the third one is the composition of a nonsmooth proximable function with a linear operator L . This template problem has many applications, for instance, in image...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of optimization theory and applications 2022-10, Vol.195 (1), p.102-130
Hauptverfasser:	Salim, Adil, Condat, Laurent, Mishchenko, Konstantin, Richtárik, Peter
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Applications of Mathematics Calculus of Variations and Optimal Control Optimization Convergence Convexity Engineering Image processing Linear operators Machine learning Mathematics Mathematics and Statistics Operations Research/Decision Theory Operators (mathematics) Optimization Theory of Computation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We consider minimizing the sum of three convex functions, where the first one F is smooth, the second one is nonsmooth and proximable and the third one is the composition of a nonsmooth proximable function with a linear operator L . This template problem has many applications, for instance, in image processing and machine learning. First, we propose a new primal–dual algorithm, which we call PDDY, for this problem. It is constructed by applying Davis–Yin splitting to a monotone inclusion in a primal–dual product space, where the operators are monotone under a specific metric depending on L . We show that three existing algorithms (the two forms of the Condat–Vũ algorithm and the PD3O algorithm) have the same structure, so that PDDY is the fourth missing link in this self-consistent class of primal–dual algorithms. This representation eases the convergence analysis: it allows us to derive sublinear convergence rates in general, and linear convergence results in presence of strong convexity. Moreover, within our broad and flexible analysis framework, we propose new stochastic generalizations of the algorithms, in which a variance-reduced random estimate of the gradient of F is used, instead of the true gradient. Furthermore, we obtain, as a special case of PDDY, a linearly converging algorithm for the minimization of a strongly convex function F under a linear constraint; we discuss its important application to decentralized optimization.
ISSN:	0022-3239 1573-2878
DOI:	10.1007/s10957-022-02061-8