Discounting, Ergodicity and Convergence for Markov Decision Processes

The rate at which Markov decision processes converge as the horizon length increases can be important for computations and judging the appropriateness of models. The convergence rate is commonly associated with the discount factor . For example, the total value function for a broad set of problems i...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Management science 1977-04, Vol.23 (8), p.890-900
Hauptverfasser:	Morton, Thomas E, Wecker, William E
Format:	Artikel
Sprache:	eng
Schlagworte:	Dynamic programming Eigenvalues Ergodic theory Management science Markov analysis Markov chains Markov models Markov processes Mathematical theorems Optimal policy Perceptron convergence procedure
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The rate at which Markov decision processes converge as the horizon length increases can be important for computations and judging the appropriateness of models. The convergence rate is commonly associated with the discount factor . For example, the total value function for a broad set of problems is known to converge 0( n ), i.e., geometrically with the discount factor. But the rate at which the finite horizon optimal policies converge depends on the convergence of the relative value function. (Relative value at a given state is the difference between total value at that state and total value at some fixed reference state.) Relative value convergence in turn depends both on the discount factor and on ergodic properties of the underlying nonhomogeneous Markov chains. We show in particular that for the stationary finite state space compact action space Markov decision problem, the relative value function converges 0(( ) n ) for all > r ( P ), the argument of the subdominant eigenvalue of the optimal infinite horizon policy (assumed unique). Easily obtained bounds for r ( P ) are also given which are related to those of A. Brauer. Under additional restrictions, policy convergence is shown to be of the same order as relative value convergence, generalizing work of Shapiro, Schweitzer, and Odoni. The same result gives convergence properties for the undiscounted problem and for the case > 1. If r ( P ) > 1 the problem does not converge. As a by-product of the analysis, necessary conditions are given for the relative value function to converge 0(( ) n ), 0 < < 1, for the nonstationary problem.
ISSN:	0025-1909 1526-5501
DOI:	10.1287/mnsc.23.8.890