Natural Gradient Primal-Dual Method for Decentralized Learning
We propose the Natural Gradient Primal-Dual (NGPD) method for decentralized learning of parameters in Deep Neural Networks (DNNs). Conventional approaches, such as the primal-dual method, constrain the local parameters to be similar between connected nodes. However, since most of them follow a first...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on signal and information processing over networks 2024, Vol.10, p.417-433 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We propose the Natural Gradient Primal-Dual (NGPD) method for decentralized learning of parameters in Deep Neural Networks (DNNs). Conventional approaches, such as the primal-dual method, constrain the local parameters to be similar between connected nodes. However, since most of them follow a first-order optimization method and the loss functions of DNNs may have ill-conditioned curvatures, many local parameter updates and communication among local nodes are needed. For fast convergence, we integrate the second-order natural gradient method into the primal-dual method (NGPD). Since additional constraint minimizes the amount of output change before and after the parameter updates, robustness towards ill-conditioned curvatures is expected. We theoretically demonstrate the convergence rate for the averaged parameter (the average of the local parameters) under certain assumptions. As a practical implementation of NGPD without a significant increase in computational overheads, we introduce Kronecker Factored Approximate Curvature (K-FAC). Our experimental results confirmed that NGPD achieved the highest test accuracy through image classification tasks using DNNs. |
---|---|
ISSN: | 2373-776X 2373-776X 2373-7778 |
DOI: | 10.1109/TSIPN.2024.3388948 |