Hybrid Reinforcement Learning for Optimal Control of Non-Linear Switching System

Based on the reinforcement learning mechanism, a data-based scheme is proposed to address the optimal control problem of discrete-time non-linear switching systems. In contrast to conventional systems, in the switching systems, the control signal consists of the active mode (discrete) and the contro...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transaction on neural networks and learning systems 2023-11, Vol.34 (11), p.9161-9170
Hauptverfasser:	Li, Xiaofeng, Dong, Lu, Xue, Lei, Sun, Changyin
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptive dynamic programming (DP) Aerospace electronics Algorithms Approximation algorithms Artificial neural networks Continuity (mathematics) Control systems Discrete time systems Heuristic algorithms hybrid action space Machine learning Neural networks Nonlinear control Nonlinear systems normalized advantage value function (NAF) Optimal control reinforcement learning (RL) Subsystems Switches Switching switching system Switching systems
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Based on the reinforcement learning mechanism, a data-based scheme is proposed to address the optimal control problem of discrete-time non-linear switching systems. In contrast to conventional systems, in the switching systems, the control signal consists of the active mode (discrete) and the control inputs (continuous). First, the Hamilton-Jacobi-Bellman equation of the hybrid action space is derived, and a two-stage value iteration method is proposed to learn the optimal solution. In addition, a neural network structure is designed by decomposing the Q-function into the value function and the normalized advantage value function, which is quadratic with respect to the continuous control of subsystems. In this way, the Q-function and the continuous policy can be simultaneously updated at each iteration step so that the training of hybrid policies is simplified to a one-step manner. Moreover, the convergence analysis of the proposed algorithm with consideration of approximation error is provided. Finally, the algorithm is applied evaluated on three different simulation examples. Compared to the related work, the results demonstrate the potential of our method.
ISSN:	2162-237X 2162-2388
DOI:	10.1109/TNNLS.2022.3156287