On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

We consider a totally asynchronous stochastic approximation algorithm, Q-learning, for solving finite space stochastic shortest path (SSP) problems, which are undiscounted, total cost Markov decision processes with an absorbing and cost-free state. For the most commonly used SSP models, existing con...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Mathematics of operations research 2013-05, Vol.38 (2), p.209-227
Hauptverfasser:	Yu, Huizhen, Bertsekas, Dimitri P.
Format:	Artikel
Sprache:	eng
Schlagworte:	Analysis Dynamic programming Learning Markov analysis Markov decision processes Markov processes Q-learning reinforcement learning Shortest path algorithms stochastic approximation Stochastic models Stochastic processes Studies
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We consider a totally asynchronous stochastic approximation algorithm, Q-learning, for solving finite space stochastic shortest path (SSP) problems, which are undiscounted, total cost Markov decision processes with an absorbing and cost-free state. For the most commonly used SSP models, existing convergence proofs assume that the sequence of Q-learning iterates is bounded with probability one, or some other condition that guarantees boundedness. We prove that the sequence of iterates is naturally bounded with probability one, thus furnishing the boundedness condition in the convergence proof by Tsitsiklis [Tsitsiklis JN (1994) Asynchronous stochastic approximation and Q-learning. Machine Learn. 16:185-202] and establishing completely the convergence of Q-learning for these SSP models.
ISSN:	0364-765X 1526-5471
DOI:	10.1287/moor.1120.0562