ABNGrad: adaptive step size gradient descent for optimizing neural networks

Stochastic adaptive gradient decent algorithms, such as AdaGrad and Adam, are extensively used to train deep neural networks. However, randomly sampling gradient information introduces instability to the learning rates, leading to adaptive methods with poor generalization. To address this issue, the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied intelligence (Dordrecht, Netherlands) Netherlands), 2024-02, Vol.54 (3), p.2361-2378
Hauptverfasser:	Jiang, Wenhan, Liang, Yuqing, Jiang, Zhixia, Xu, Dongpo, Zhou, Linhua
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptive algorithms Algorithms Artificial Intelligence Artificial neural networks Computer Science Convexity Machine learning Machines Manufacturing Mechanical Engineering Neural networks Optimization Processes
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Stochastic adaptive gradient decent algorithms, such as AdaGrad and Adam, are extensively used to train deep neural networks. However, randomly sampling gradient information introduces instability to the learning rates, leading to adaptive methods with poor generalization. To address this issue, the ABNGrad algorithm, which leverages the absolute value operation and the normalization technique, is proposed. More specifically, the absolute value function is first incorporated into the iteration of the second-order moment estimate to ensure that it monotonically increases. Then, the normalization technique is employed to prevent a rapid decrease in the learning rate. In particular, the techniques used in this paper can also be integrated into other existing adaptive algorithms, such as Adam, AdamW, AdaBound, and RAdam, yielding good performance. Additionally, it is shown that ABNGrad can attain the optimal regret bound for solving online convex optimization problems. Finally, many experimental results illustrate the effectiveness of ABNGrad. For a comprehensive exploration of the advantages of the proposed approach and the specifics of its detailed implementation, the readers are referred to the following https://github.com/Wenhan-Jiang/ABNGrad.git Graphical abstract
ISSN:	0924-669X 1573-7497
DOI:	10.1007/s10489-024-05303-6