AdaXod: a new adaptive and momental bound algorithm for training deep neural networks

Adaptive algorithms are widely used in deep learning because of their fast convergence. Among them, Adam is the most widely used algorithm. However, studies have shown that Adam’s generalization ability is weak. AdaX is a variant of Adam, which introduces a novel second-order momentum, modifies the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of supercomputing 2023-10, Vol.79 (15), p.17691-17715
Hauptverfasser:	Liu, Yuanxuan, Li, Dequan
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptive algorithms Algorithms Artificial neural networks Compilers Computer Science Convergence Deep learning Interpreters Machine learning Neural networks Optimization Processor Architectures Programming Languages Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Adaptive algorithms are widely used in deep learning because of their fast convergence. Among them, Adam is the most widely used algorithm. However, studies have shown that Adam’s generalization ability is weak. AdaX is a variant of Adam, which introduces a novel second-order momentum, modifies the second-order moment of Adam, and has good generalization ability. However, these algorithms may fail to converge due to instability and extreme learning rates during training. In this paper, we propose a new adaptive and momental bound algorithm, called AdaXod, which characterizes of exponentially averaging the learning rate and is particularly useful for training deep neural networks. By setting an adaptively limited learning rate in the AdaX algorithm, the resultant AdaXod can effectively eliminate the problem of excessive learning rate in the later stage of neural networks training and thus results in stable training. We conduct extensive experiments on different datasets and verify the advantages of the AdaXod algorithm by comparing with other advanced adaptive optimization algorithms. AdaXod eliminates large learning rates during neural networks training and outperforms other optimizers, especially for some neural networks with complex structures, such as DenseNet.
ISSN:	0920-8542 1573-0484
DOI:	10.1007/s11227-023-05338-5