Inverse Value Iteration and Q -Learning: Algorithms, Stability, and Robustness
This article proposes a data-driven model-free inverse Q -learning algorithm for continuous-time linear quadratic regulators (LQRs). Using an agent's trajectories of states and optimal control inputs, the algorithm reconstructs its cost function that captures the same trajectories. This articl...
Gespeichert in:
Veröffentlicht in: | IEEE transaction on neural networks and learning systems 2024-06, Vol.PP, p.1-11 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This article proposes a data-driven model-free inverse Q -learning algorithm for continuous-time linear quadratic regulators (LQRs). Using an agent's trajectories of states and optimal control inputs, the algorithm reconstructs its cost function that captures the same trajectories. This article first poses a model-based inverse value iteration scheme using the agent's system dynamics. Then, an online model-free inverse Q -learning algorithm is developed to recover the agent's cost function only using the demonstrated trajectories. It is more efficient than the existing inverse reinforcement learning (RL) algorithms as it avoids the repetitive RL in inner loops. The proposed algorithms do not need initial stabilizing control policies and solve for unbiased solutions. The proposed algorithm's asymptotic stability, convergence, and robustness are guaranteed. Theoretical analysis and simulation examples show the effectiveness and advantages of the proposed algorithms. |
---|---|
ISSN: | 2162-237X 2162-2388 2162-2388 |
DOI: | 10.1109/TNNLS.2024.3409182 |