Complexity of finite-horizon Markov decision process problems

Controlled stochastic systems occur in science engineering, manufacturing, social sciences, and many other cntexts. If the systems is modeled as a Markov decision process (MDP) and will run ad infinitum , the optimal control policy can be computed in polynomial time using linear programming. The pro...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of the ACM 2000-07, Vol.47 (4), p.681-720
Hauptverfasser:	MUNDHENK, MARTIN, Goldsmith, Judy, LUSENA, CHRISTOPHER, ALLENDER, ERIC
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Computer programming Decision analysis Linear programming Markov analysis Markov processes Social sciences Stochastic systems Studies Theory
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Controlled stochastic systems occur in science engineering, manufacturing, social sciences, and many other cntexts. If the systems is modeled as a Markov decision process (MDP) and will run ad infinitum , the optimal control policy can be computed in polynomial time using linear programming. The problems considered here assume that the time that the process will run is finite, and based on the size of the input. There are mny factors that compound the complexity of computing the optimal policy. For instance, there are many factors that compound the complexity of this computation. For instance, if the controller does not have complete information about the state of the system, or if the system is represented in some very succint manner, the optimal policy is provably not computable in time polynomial in the size of the input. We analyze the computational complexity of evaluating policies and of determining whether a sufficiently good policy exists for a MDP, based on a number of confounding factors, including the observability of the system state; the succinctness of the representation; the type of policy; even the number of actions relative to the number of states. In almost every case, we show that the decision problem is complete for some known complexity class. Some of these results are familiar from work by Papadimitriou and Tsitsiklis and others, but some, such as our PL-completeness proofs, are surprising. We include proofs of completeness for natural problems in the as yet little-studied classes NP PP .
ISSN:	0004-5411 1557-735X
DOI:	10.1145/347476.347480