AdaLip: An Adaptive Learning Rate Method per Layer for Stochastic Optimization

Various works have been published around the optimization of Neural Networks that emphasize the significance of the learning rate. In this study we analyze the need for a different treatment for each layer and how this affects training. We propose a novel optimization technique, called AdaLip, that...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neural processing letters 2023-10, Vol.55 (5), p.6311-6338
Hauptverfasser: Ioannou, George, Tagaris, Thanos, Stafylopatis, Andreas
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Various works have been published around the optimization of Neural Networks that emphasize the significance of the learning rate. In this study we analyze the need for a different treatment for each layer and how this affects training. We propose a novel optimization technique, called AdaLip, that utilizes an estimation of the Lipschitz constant of the gradients in order to construct an adaptive learning rate per layer that can work on top of already existing optimizers, like SGD or Adam. A detailed experimental framework was used to prove the usefulness of the optimizer on three benchmark datasets. It showed that AdaLip improves the training performance and the convergence speed, but also made the training process more robust to the selection of the initial global learning rate.
ISSN:1370-4621
1573-773X
DOI:10.1007/s11063-022-11140-w