ABNGrad: adaptive step size gradient descent for optimizing neural networks
Stochastic adaptive gradient decent algorithms, such as AdaGrad and Adam, are extensively used to train deep neural networks. However, randomly sampling gradient information introduces instability to the learning rates, leading to adaptive methods with poor generalization. To address this issue, the...
Gespeichert in:
Veröffentlicht in: | Applied intelligence (Dordrecht, Netherlands) Netherlands), 2024-02, Vol.54 (3), p.2361-2378 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Stochastic adaptive gradient decent algorithms, such as AdaGrad and Adam, are extensively used to train deep neural networks. However, randomly sampling gradient information introduces instability to the learning rates, leading to adaptive methods with poor generalization. To address this issue, the ABNGrad algorithm, which leverages the absolute value operation and the normalization technique, is proposed. More specifically, the absolute value function is first incorporated into the iteration of the second-order moment estimate to ensure that it monotonically increases. Then, the normalization technique is employed to prevent a rapid decrease in the learning rate. In particular, the techniques used in this paper can also be integrated into other existing adaptive algorithms, such as Adam, AdamW, AdaBound, and RAdam, yielding good performance. Additionally, it is shown that ABNGrad can attain the optimal regret bound for solving online convex optimization problems. Finally, many experimental results illustrate the effectiveness of ABNGrad. For a comprehensive exploration of the advantages of the proposed approach and the specifics of its detailed implementation, the readers are referred to the following
https://github.com/Wenhan-Jiang/ABNGrad.git
Graphical abstract |
---|---|
ISSN: | 0924-669X 1573-7497 |
DOI: | 10.1007/s10489-024-05303-6 |