Reinforcement Learning Behavioral Control for Nonlinear Autonomous System

Behavior-based autonomous systems rely on human intelligence to resolve multi-mission conflicts by designing mission priority rules and nonlinear controllers. In this work, a novel two-layer reinforcement learning behavioral control (RLBC) method is proposed to reduce such dependence by trial-and-er...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE/CAA journal of automatica sinica 2022-09, Vol.9 (9), p.1561-1573
Hauptverfasser:	Zhang, Zhenyi, Mo, Zhibin, Chen, Yutao, Huang, Jie
Format:	Artikel
Sprache:	eng
Schlagworte:	Autonomous systems Behavioral control Control systems design Controllers Costs Error signals Heuristic algorithms Human intelligence Learning mission supervisor Neural networks nonlinear autonomous system Nonlinear control Optimal control Optimization Optimization methods Reinforcement learning Simulation Supervisors
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Behavior-based autonomous systems rely on human intelligence to resolve multi-mission conflicts by designing mission priority rules and nonlinear controllers. In this work, a novel two-layer reinforcement learning behavioral control (RLBC) method is proposed to reduce such dependence by trial-and-error learning. Specifically, in the upper layer, a reinforcement learning mission supervisor (RLMS) is designed to learn the optimal mission priority. Compared with existing mission supervisors, the RLMS improves the dynamic performance of mission priority adjustment by maximizing cumulative rewards and reducing hardware storage demand when using neural networks. In the lower layer, a reinforcement learning controller (RLC) is designed to learn the optimal control policy. Compared with existing behavioral controllers, the RLC reduces the control cost of mission priority adjustment by balancing control performance and consumption. All error signals are proved to be semi-globally uniformly ultimately bounded (SGUUB). Simulation results show that the number of mission priority adjustment and the control cost are significantly reduced compared to some existing mission supervisors and behavioral controllers, respectively.
ISSN:	2329-9266 2329-9274
DOI:	10.1109/JAS.2022.105797