CDiNN -Convex Difference Neural Networks
Neural networks with ReLU activation function have been shown to be universal function approximators and learn function mapping as non-smooth functions. Recently, there is considerable interest in the use of neural networks in applications such as optimal control. It is well-known that optimization...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Neural networks with ReLU activation function have been shown to be universal
function approximators and learn function mapping as non-smooth functions.
Recently, there is considerable interest in the use of neural networks in
applications such as optimal control. It is well-known that optimization
involving non-convex, non-smooth functions are computationally intensive and
have limited convergence guarantees. Moreover, the choice of optimization
hyper-parameters used in gradient descent/ascent significantly affect the
quality of the obtained solutions. A new neural network architecture called the
Input Convex Neural Networks (ICNNs) learn the output as a convex function of
inputs thereby allowing the use of efficient convex optimization methods. Use
of ICNNs for determining the input for minimizing output has two major
problems: learning of a non-convex function as a convex mapping could result in
significant function approximation error, and we also note that the existing
representations cannot capture simple dynamic structures like linear time delay
systems. We attempt to address the above problems by introduction of a new
neural network architecture, which we call the CDiNN, which learns the function
as a difference of polyhedral convex functions from data. We also discuss that,
in some cases, the optimal input can be obtained from CDiNN through difference
of convex optimization with convergence guarantees and that at each iteration,
the problem is reduced to a linear programming problem. |
---|---|
DOI: | 10.48550/arxiv.2103.17231 |