Off-Policy Model-Free Learning for Multi-Player Non-Zero-Sum Games With Constrained Inputs

In this paper, multi-player non-zero-sum games with control constraints are studied by utilizing a novel model-free approach based on adaptive dynamic programming framework. First, the model-based policy iteration (PI) method is provided, which requires the system dynamics, and the convergence is de...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on circuits and systems. I, Regular papers Regular papers, 2023-02, Vol.70 (2), p.910-920
Hauptverfasser: Huo, Yu, Wang, Ding, Qiao, Junfei, Li, Menghua
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this paper, multi-player non-zero-sum games with control constraints are studied by utilizing a novel model-free approach based on adaptive dynamic programming framework. First, the model-based policy iteration (PI) method is provided, which requires the system dynamics, and the convergence is demonstrated. Then, aiming to eliminate the need for the system dynamics, a model-free iterative method is obtained by using the off-policy integral reinforcement learning (IRL) scheme based on the PI approach. Moreover, the system data is collected in order to construct the model-free approach. Besides, we analyze the convergence of the off-policy IRL approach by proving the equivalence between the model-free iterative approach and the model-based iterative approach. Remarkably, in the implementation of the scheme, the control policy and cost function are approximated by utilizing the actor-critic networks. The least square algorithm is utilized to learn the actor-critic networks weights depended on the collected data sets. Finally, two cases are provided to demonstrate the effectiveness of the established framework.
ISSN:1549-8328
1558-0806
DOI:10.1109/TCSI.2022.3221274