Online inverse reinforcement learning for nonlinear systems with adversarial attacks

In the inverse reinforcement learning (RL) problem, there are two agents. A learner agent seeks to mimic another expert agent's state and control input behavior trajectories by observing the expert's behavior trajectories. These observations are used to reconstruct the unknown expert'...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of robust and nonlinear control 2021-09, Vol.31 (14), p.6646-6667
Hauptverfasser: Lian, Bosen, Xue, Wenqian, Lewis, Frank L., Chai, Tianyou
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In the inverse reinforcement learning (RL) problem, there are two agents. A learner agent seeks to mimic another expert agent's state and control input behavior trajectories by observing the expert's behavior trajectories. These observations are used to reconstruct the unknown expert's performance objective. This article develops novel inverse RL algorithms to solve the inverse RL problem in which both agents suffer from adversarial attacks and have continuous‐time nonlinear dynamics. We first propose an offline inverse RL algorithm for the learner to reconstruct unknown expert's performance objective. This offline inverse RL algorithm is based on the technique of integral RL (IRL) and only needs partial knowledge of the system dynamics. The algorithm has two learning stages: an optimal control learning stage first and a second learning stage based on inverse optimal control. Then, based on the offline algorithm, an online inverse RL algorithm is further developed to solve the inverse RL problem in real time without knowing the system drift dynamics. This online adaptive learning method consists of simultaneous adaptation of four neural networks (NNs): a critic NN, an actor NN, an adversary NN, and a state penalty NN. Convergence of the algorithms as well as the stability of the learner system and the synchronous tuning NNs are guaranteed. Simulation examples verify the effectiveness of the online method.
ISSN:1049-8923
1099-1239
DOI:10.1002/rnc.5626