Decentralized Q-Learning for Stochastic Teams and Games

There are only a few learning algorithms applicable to stochastic dynamic teams and games which generalize Markov decision processes to decentralized stochastic control problems involving possibly self-interested decision makers. Learning in games is generally difficult because of the non-stationary...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on automatic control 2017-04, Vol.62 (4), p.1545-1558
Hauptverfasser: Arslan, Gurdal, Yuksel, Serdar
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:There are only a few learning algorithms applicable to stochastic dynamic teams and games which generalize Markov decision processes to decentralized stochastic control problems involving possibly self-interested decision makers. Learning in games is generally difficult because of the non-stationary environment in which each decision maker aims to learn its optimal decisions with minimal information in the presence of the other decision makers who are also learning. In stochastic dynamic games, learning is more challenging because, while learning, the decision makers alter the state of the system and hence the future cost. In this paper, we present decentralized Q-learning algorithms for stochastic games, and study their convergence for the weakly acyclic case which includes team problems as an important special case. The algorithms are decentralized in that each decision maker has access only to its own decisions and cost realizations as well as the state transitions; in particular, each decision maker is completely oblivious to the presence of the other decision makers. We show that these algorithms converge to equilibrium policies almost surely in large classes of stochastic games.
ISSN:0018-9286
1558-2523
DOI:10.1109/TAC.2016.2598476