Convergence Rates of Training Deep Neural Networks via Alternating Minimization Methods
Training deep neural networks (DNNs) is an important and challenging optimization problem in machine learning due to its non-convexity and non-separable structure. The alternating minimization (AM) approaches split the composition structure of DNNs and have drawn great interest in the deep learning...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Training deep neural networks (DNNs) is an important and challenging
optimization problem in machine learning due to its non-convexity and
non-separable structure. The alternating minimization (AM) approaches split the
composition structure of DNNs and have drawn great interest in the deep
learning and optimization communities. In this paper, we propose a unified
framework for analyzing the convergence rate of AM-type network training
methods. Our analysis is based on the non-monotone $j$-step sufficient decrease
conditions and the Kurdyka-Lojasiewicz (KL) property, which relaxes the
requirement of designing descent algorithms. We show the detailed local
convergence rate if the KL exponent $\theta$ varies in $[0,1)$. Moreover, the
local R-linear convergence is discussed under a stronger $j$-step sufficient
decrease condition. |
---|---|
DOI: | 10.48550/arxiv.2208.14318 |