Off-Policy Model-Free Learning for Multi-Player Non-Zero-Sum Games With Constrained Inputs
In this paper, multi-player non-zero-sum games with control constraints are studied by utilizing a novel model-free approach based on adaptive dynamic programming framework. First, the model-based policy iteration (PI) method is provided, which requires the system dynamics, and the convergence is de...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on circuits and systems. I, Regular papers Regular papers, 2023-02, Vol.70 (2), p.910-920 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper, multi-player non-zero-sum games with control constraints are studied by utilizing a novel model-free approach based on adaptive dynamic programming framework. First, the model-based policy iteration (PI) method is provided, which requires the system dynamics, and the convergence is demonstrated. Then, aiming to eliminate the need for the system dynamics, a model-free iterative method is obtained by using the off-policy integral reinforcement learning (IRL) scheme based on the PI approach. Moreover, the system data is collected in order to construct the model-free approach. Besides, we analyze the convergence of the off-policy IRL approach by proving the equivalence between the model-free iterative approach and the model-based iterative approach. Remarkably, in the implementation of the scheme, the control policy and cost function are approximated by utilizing the actor-critic networks. The least square algorithm is utilized to learn the actor-critic networks weights depended on the collected data sets. Finally, two cases are provided to demonstrate the effectiveness of the established framework. |
---|---|
ISSN: | 1549-8328 1558-0806 |
DOI: | 10.1109/TCSI.2022.3221274 |