Reinforcement Learning with Restrictions on the Action Set

Consider a two-player normal-form game repeated over time. We introduce an adaptive learning procedure, where the players only observe their own realized payoff at each stage. We assume that agents do not know their own payoff function and have no information on the other player. Furthermore, we ass...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:SIAM journal on control and optimization 2015-01, Vol.53 (1), p.287-312
Hauptverfasser: Bravo, Mario, Faure, Mathieu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Consider a two-player normal-form game repeated over time. We introduce an adaptive learning procedure, where the players only observe their own realized payoff at each stage. We assume that agents do not know their own payoff function and have no information on the other player. Furthermore, we assume that they have restrictions on their own actions such that, at each stage, their choice is limited to a subset of their action set. We prove that the empirical distributions of play converge to the set of Nash equilibria for zero-sum and potential games, and games where one player has two actions.
ISSN:0363-0129
1095-7138
DOI:10.1137/130936488