Approximate Nash Solutions for Multiplayer Mixed-Zero-Sum Game With Reinforcement Learning

Inspired by Nash game theory, a multiplayer mixed-zero-sum (MZS) nonlinear game considering both two situations [zero-sum and nonzero-sum (NZS) Nash games] is proposed in this paper. A synchronous reinforcement learning (RL) scheme based on the identifier-critic structure is developed to learn the N...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on systems, man, and cybernetics. Systems man, and cybernetics. Systems, 2019-12, Vol.49 (12), p.2739-2750
Hauptverfasser:	Lv, Yongfeng, Ren, Xuemei
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Approximate dynamic programming (ADP) Approximation algorithms Artificial neural networks Computer simulation Economic models Equilibrium Game theory Games Heuristic algorithms Machine learning Nash equilibrium Nash games Neural networks neural networks (NNs) Nonlinear dynamics Optimal control Performance analysis Performance indices Players reinforcement learning (RL) system identification Zero sum games
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Inspired by Nash game theory, a multiplayer mixed-zero-sum (MZS) nonlinear game considering both two situations [zero-sum and nonzero-sum (NZS) Nash games] is proposed in this paper. A synchronous reinforcement learning (RL) scheme based on the identifier-critic structure is developed to learn the Nash equilibrium solution of the proposed MZS game. First, the MZS game formulation is presented, where the performance indexes for players 1 to N - 1 and N NZS Nash game are presented, and another performance index for players N and N + 1 zero-sum game is presented, such that player N cooperates with players 1 to N - 1, while competes with player N + 1, which leads to a Nash equilibrium of all players. A single-layer neural network (NN) is then used to approximate the unknown dynamics of the nonlinear game system. Finally, an RL scheme based on NNs is developed to learn the optimal performance indexes, which can be used to produce the optimal control policy of every player such that Nash equilibrium can be obtained. Thus, the widely used actor NN in RL literature is not needed. To this end, a recently proposed adaptive law is used to estimate the unknown identifier coefficient vectors, and an improved adaptive law with the error performance index is further developed to update the critic coefficient vectors. Both linear and nonlinear simulations are presented to demonstrate the existence of Nash equilibrium for MZS game and performance of the proposed algorithm.
ISSN:	2168-2216 2168-2232
DOI:	10.1109/TSMC.2018.2861826