Q-Learning for Risk-Sensitive Control

We propose for risk-sensitive control of finite Markov chains a counterpart of the popular Q-learning algorithm for classical Markov decision processes. The algorithm is shown to converge with probability one to the desired solution. The proof technique is an adaptation of the o.d.e. approach for th...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Mathematics of operations research 2002-05, Vol.27 (2), p.294-311
1. Verfasser:	Borkar, V. S
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Analysis Approximation Average cost Cost control Dynamic programming Machine learning Markov analysis Markov chains Markov decision processes Markov processes Mathematics Operations research Q-learning reinforcement learning risk-sensitive control Stochastic approximation Stochastic models Trajectories
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We propose for risk-sensitive control of finite Markov chains a counterpart of the popular Q-learning algorithm for classical Markov decision processes. The algorithm is shown to converge with probability one to the desired solution. The proof technique is an adaptation of the o.d.e. approach for the analysis of stochastic approximation algorithms, with most of the work involved used for the analysis of the specific o.d.e.s that arise.
ISSN:	0364-765X 1526-5471
DOI:	10.1287/moor.27.2.294.324