Achieving Correlated Equilibrium by Studying Opponent's Behavior Through Policy-Based Deep Reinforcement Learning

Game theory is a very profound study on distributed decision-making behavior and has been extensively developed by many scholars. However, many existing works rely on certain strict assumptions such as knowing the opponent's private behaviors, which might not be practical. In this work, we focu...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2020, Vol.8, p.199682-199695
Hauptverfasser:	Tsai, Kuo Chun, Han, Zhu
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Computational geometry Convexity Correlated equilibrium Correlation Decision making Decision theory Deep learning Game theory joint distribution machine learning Mathematical model Mathematical models Nash equilibrium neural network Numerical models Probability distribution Propagation losses Reinforcement learning Synchronization
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Game theory is a very profound study on distributed decision-making behavior and has been extensively developed by many scholars. However, many existing works rely on certain strict assumptions such as knowing the opponent's private behaviors, which might not be practical. In this work, we focused on two Nobel winning concepts, the Nash equilibrium, and the correlated equilibrium. We proposed a policy-based deep reinforcement learning model which instead of just learning the regions for corresponding strategies and actions, it learns why and how the rational opponent plays. With our proposed policy-based deep reinforcement learning model, we successfully reached the correlated equilibrium which maximizes the utility for each player. Depending on the scenario, the equilibrium can reach outside of the Nash equilibrium convex hull to achieve higher utility for the players, while the traditional non-regret algorithms cannot. In addition, we also proposed a mathematical model to inverse the calculation of the correlated equilibrium probability to estimate the rational opponent player's payoff. Through simulations, with limited interaction among the players, we showed that our proposed method can achieve the optimal correlated equilibrium where each player gains an equal or higher utility than the Nash equilibrium.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2020.3035362