Federated Optimization of Smooth Loss Functions

In this work, we study empirical risk minimization (ERM) within a federated learning framework, where a central server seeks to minimize an ERM objective function using n samples of training data that is stored across m clients and the server. The recent flurry of research in this area has ident...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on information theory 2023-12, Vol.69 (12), p.7836-7866
Hauptverfasser:	Jadbabaie, Ali, Makur, Anuran, Shah, Devavrat
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Client server systems Clients Complexity Complexity theory Convexity Costs Data models Empirical analysis empirical risk minimization Federated learning gradient descent Hölder class low rank approximation Machine learning Optimization Parameters Servers Smoothness Training data
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this work, we study empirical risk minimization (ERM) within a federated learning framework, where a central server seeks to minimize an ERM objective function using n samples of training data that is stored across m clients and the server. The recent flurry of research in this area has identified the Federated Averaging ( \mathtt{FedAve} ) algorithm as the staple for determining \epsilon -approximate solutions to the ERM problem. Similar to standard optimization algorithms, e.g., stochastic gradient descent, the convergence analysis of \mathtt{FedAve} and its variants only relies on smoothness of the loss function in the optimization parameter. However, loss functions are often very smooth in the training data too. To exploit this additional smoothness in data in a federated learning context, we propose the Federated Low Rank Gradient Descent (FedLRGD) algorithm. Since smoothness in data induces an approximate low rank structure on the gradient of the loss function, our algorithm first performs a few rounds of communication between the server and clients to learn weights that the server can use to approximate clients' gradients using its own gradients. Then, our algorithm solves the ERM problem at the server using an inexact gradient descent method. To theoretically demonstrate that FedLRGD can have superior performance to \mathtt{FedAve} , we present a notion of federated oracle complexity as a counterpart to canonical oracle complexity in the optimization literature. Under some assumptions on the loss function, e.g., strong convexity and smoothness in the parameter, \eta -Hölder class smoothness in the data, etc., we prove that the federated oracle complexity of LRGD scales like \phi m (p/\epsilon)^{\Theta (d/\eta)} and that of \mathtt{FedAve}
ISSN:	0018-9448 1557-9654
DOI:	10.1109/TIT.2023.3317168