Inverse Value Iteration and Q -Learning: Algorithms, Stability, and Robustness

This article proposes a data-driven model-free inverse Q -learning algorithm for continuous-time linear quadratic regulators (LQRs). Using an agent's trajectories of states and optimal control inputs, the algorithm reconstructs its cost function that captures the same trajectories. This articl...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transaction on neural networks and learning systems 2024-06, Vol.PP, p.1-11
Hauptverfasser:	Lian, Bosen, Xue, Wenqian, Lewis, Frank L., Davoudi, Ali
Format:	Artikel
Sprache:	eng
Schlagworte:	Convergence Cost function Heuristic algorithms inverse optimal control (IOC) inverse reinforcement learning (RL) Mathematical models model-free Optimal control Q-learning Robustness stability Trajectory
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This article proposes a data-driven model-free inverse Q -learning algorithm for continuous-time linear quadratic regulators (LQRs). Using an agent's trajectories of states and optimal control inputs, the algorithm reconstructs its cost function that captures the same trajectories. This article first poses a model-based inverse value iteration scheme using the agent's system dynamics. Then, an online model-free inverse Q -learning algorithm is developed to recover the agent's cost function only using the demonstrated trajectories. It is more efficient than the existing inverse reinforcement learning (RL) algorithms as it avoids the repetitive RL in inner loops. The proposed algorithms do not need initial stabilizing control policies and solve for unbiased solutions. The proposed algorithm's asymptotic stability, convergence, and robustness are guaranteed. Theoretical analysis and simulation examples show the effectiveness and advantages of the proposed algorithms.
ISSN:	2162-237X 2162-2388 2162-2388
DOI:	10.1109/TNNLS.2024.3409182