PID Controller-Based Stochastic Optimization Acceleration for Deep Neural Networks

Deep neural networks (DNNs) are widely used and demonstrated their power in many applications, such as computer vision and pattern recognition. However, the training of these networks can be time consuming. Such a problem could be alleviated by using efficient optimizers. As one of the most commonly...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transaction on neural networks and learning systems 2020-12, Vol.31 (12), p.5079-5091
Hauptverfasser:	Wang, Haoqian, Luo, Yi, An, Wangpeng, Sun, Qingyun, Xu, Jun, Zhang, Lei
Format:	Artikel
Sprache:	eng
Schlagworte:	Acceleration Algorithms Artificial neural networks Computer vision Controllers Convergence Deep neural network (DNN) Natural language processing Neural networks Optimization Parameters Pattern recognition PD control Proportional integral derivative proportional-integral-derivative (PID) control stochastic gradient descent (SGD)-momentum Stochastic processes Stochasticity Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Deep neural networks (DNNs) are widely used and demonstrated their power in many applications, such as computer vision and pattern recognition. However, the training of these networks can be time consuming. Such a problem could be alleviated by using efficient optimizers. As one of the most commonly used optimizers, stochastic gradient descent-momentum (SGD-M) uses past and present gradients for parameter updates. However, in the process of network training, SGD-M may encounter some drawbacks, such as the overshoot phenomenon. This problem would slow the training convergence. To alleviate this problem and accelerate the convergence of DNN optimization, we propose a proportional-integral-derivative (PID) approach. Specifically, we investigate the intrinsic relationships between the PID-based controller and SGD-M first. We further propose a PID-based optimization algorithm to update the network parameters, where the past, current, and change of gradients are exploited. Consequently, our proposed PID-based optimization alleviates the overshoot problem suffered by SGD-M. When tested on popular DNN architectures, it also obtains up to 50% acceleration with competitive accuracy. Extensive experiments about computer vision and natural language processing demonstrate the effectiveness of our method on benchmark data sets, including CIFAR10, CIFAR100, Tiny-ImageNet, and PTB. We have released the code at https://github.com/tensorboy/PIDOptimizer .
ISSN:	2162-237X 2162-2388
DOI:	10.1109/TNNLS.2019.2963066