Neural Lyapunov Model Predictive Control: Learning Safe Global Controllers from Sub-optimal Examples
With a growing interest in data-driven control techniques, Model Predictive Control (MPC) provides an opportunity to exploit the surplus of data reliably, particularly while taking safety and stability into account. In many real-world and industrial applications, it is typical to have an existing co...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | With a growing interest in data-driven control techniques, Model Predictive
Control (MPC) provides an opportunity to exploit the surplus of data reliably,
particularly while taking safety and stability into account. In many real-world
and industrial applications, it is typical to have an existing control
strategy, for instance, execution from a human operator. The objective of this
work is to improve upon this unknown, safe but suboptimal policy by learning a
new controller that retains safety and stability. Learning how to be safe is
achieved directly from data and from a knowledge of the system constraints. The
proposed algorithm alternatively learns the terminal cost and updates the MPC
parameters according to a stability metric. The terminal cost is constructed as
a Lyapunov function neural network with the aim of recovering or extending the
stable region of the initial demonstrator using a short prediction horizon.
Theorems that characterize the stability and performance of the learned MPC in
the bearing of model uncertainties and sub-optimality due to function
approximation are presented. The efficacy of the proposed algorithm is
demonstrated on non-linear continuous control tasks with soft constraints. The
proposed approach can improve upon the initial demonstrator also in practice
and achieve better stability than popular reinforcement learning baselines. |
---|---|
DOI: | 10.48550/arxiv.2002.10451 |