Episodic Logit-Q Dynamics for Efficient Learning in Stochastic Teams
We present new learning dynamics combining (independent) log-linear learning and value iteration for stochastic games within the auxiliary stage game framework. The dynamics presented provably attain the efficient equilibrium (also known as optimal equilibrium) in identical-interest stochastic games...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We present new learning dynamics combining (independent) log-linear learning
and value iteration for stochastic games within the auxiliary stage game
framework. The dynamics presented provably attain the efficient equilibrium
(also known as optimal equilibrium) in identical-interest stochastic games,
beyond the recent concentration of progress on provable convergence to some
(possibly inefficient) equilibrium. The dynamics are also independent in the
sense that agents take actions consistent with their local viewpoint to a
reasonable extent rather than seeking equilibrium. These aspects can be of
practical interest in the control applications of intelligent and autonomous
systems. The key challenges are the convergence to an inefficient equilibrium
and the non-stationarity of the environment from a single agent's viewpoint due
to the adaptation of others. The log-linear update plays an important role in
addressing the former. We address the latter through the play-in-episodes
scheme in which the agents update their Q-function estimates only at the end of
the episodes. |
---|---|
DOI: | 10.48550/arxiv.2309.02675 |