RALR: Random Amplify Learning Rates for Training Neural Networks

It has been shown that the learning rate is one of the most critical hyper-parameters for the overall performance of deep neural networks. In this paper, we propose a new method for setting the global learning rate, named random amplify learning rates (RALR), to improve the performance of any optimi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied sciences 2022-01, Vol.12 (1), p.268
Hauptverfasser:	Deng, Jiali, Gong, Haigang, Liu, Minghui, Xie, Tianshu, Cheng, Xuan, Wang, Xiaomin, Liu, Ming
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Classification Deep learning deep neural networks Image classification Learning learning rate Machine translation Mathematical functions Methods Neural networks Optimization algorithms RALR Regularization Saddle points
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	It has been shown that the learning rate is one of the most critical hyper-parameters for the overall performance of deep neural networks. In this paper, we propose a new method for setting the global learning rate, named random amplify learning rates (RALR), to improve the performance of any optimizer in training deep neural networks. Instead of monotonically decreasing the learning rate, we expect to escape saddle points or local minima by amplifying the learning rate between reasonable boundary values based on a given probability. Training with RALR rather than conventionally decreasing the learning rate achieves further improvement on networks’ performance without extra consumption. Remarkably, the RALR is complementary with state-of-the-art data augmentation and regularization methods. Besides, we empirically study its performance on image classification tasks, fine-grained classification tasks, object detection tasks, and machine translation tasks. Experiments demonstrate that RALR can bring a notable improvement while preventing overfitting when training deep neural networks. For example, the classification accuracy of ResNet-110 trained on the CIFAR-100 dataset using RALR achieves a 1.34% gain compared with ResNet-110 trained traditionally.
ISSN:	2076-3417 2076-3417
DOI:	10.3390/app12010268