Learning Activation Functions in Deep (Spline) Neural Networks

We develop an efficient computational solution to train deep neural networks (DNN) with free-form activation functions. To make the problem well-posed, we augment the cost functional of the DNN by adding an appropriate shape regularization: the sum of the second-order total-variations of the trainab...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE open journal of signal processing 2020, Vol.1, p.295-309
Hauptverfasser:	Bohra, Pakshal, Campos, Joaquim, Gupta, Harshit, Aziznejad, Shayan, Unser, Michael
Format:	Artikel
Sprache:	eng
Schlagworte:	Activation functions Artificial neural networks B spline functions B-splines Basis functions Computational efficiency Deep learning Free form Functionals Machine learning Neural networks Neurons Optimization Regularization sparsity Splines (mathematics) Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We develop an efficient computational solution to train deep neural networks (DNN) with free-form activation functions. To make the problem well-posed, we augment the cost functional of the DNN by adding an appropriate shape regularization: the sum of the second-order total-variations of the trainable nonlinearities. The representer theorem for DNNs tells us that the optimal activation functions are adaptive piecewise-linear splines, which allows us to recast the problem as a parametric optimization. The challenging point is that the corresponding basis functions (ReLUs) are poorly conditioned and that the determination of their number and positioning is also part of the problem. We circumvent the difficulty by using an equivalent B-spline basis to encode the activation functions and by expressing the regularization as an ℓ 1 -penalty. This results in the specification of parametric activation function modules that can be implemented and optimized efficiently on standard development platforms. We present experimental results that demonstrate the benefit of our approach.
ISSN:	2644-1322 2644-1322
DOI:	10.1109/OJSP.2020.3039379