A new approach for the vanishing gradient problem on sigmoid activation

The vanishing gradient problem (VGP) is an important issue at training time on multilayer neural networks using the backpropagation algorithm. This problem is worse when sigmoid transfer functions are used, in a network with many hidden layers. However, the sigmoid function is very important in seve...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Progress in artificial intelligence 2020-12, Vol.9 (4), p.351-360
Hauptverfasser:	Roodschild, Matías, Gotay Sardiñas, Jorge, Will, Adrián
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial Intelligence Artificial neural networks Back propagation Back propagation networks Computational Intelligence Computer Imaging Computer Science Control Data Mining and Knowledge Discovery Mechatronics Multilayers Natural Language Processing (NLP) Neural networks Pattern Recognition and Graphics Recurrent neural networks Regular Paper Robotics Training Transfer functions Vision
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The vanishing gradient problem (VGP) is an important issue at training time on multilayer neural networks using the backpropagation algorithm. This problem is worse when sigmoid transfer functions are used, in a network with many hidden layers. However, the sigmoid function is very important in several architectures such as recurrent neural networks and autoencoders, where the VGP might also appear. In this article, we propose a modification of the backpropagation algorithm for the sigmoid neurons training. It consists of adding a small constant to the calculation of the sigmoid’s derivative so that the proposed training direction differs slightly from the gradient while keeping the original sigmoid function in the network. This approach suggests that the derivative’s modification produces the same accuracy in fewer training steps on most datasets. Moreover, due to VGP, the original derivative does not converge using sigmoid functions on more than five hidden layers. However, the modification allows backpropagation to train two extra hidden layers in feedforward neural networks.
ISSN:	2192-6352 2192-6360
DOI:	10.1007/s13748-020-00218-y