The value iteration algorithm is not strongly polynomial for discounted dynamic programming

This note provides a simple example demonstrating that, if exact computations are allowed, the number of iterations required for the value iteration algorithm to find an optimal policy for discounted dynamic programming problems may grow arbitrarily quickly with the size of the problem. In particula...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Operations research letters 2014-03, Vol.42 (2), p.130-131
Hauptverfasser:	Feinberg, Eugene A., Huang, Jefferson
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithm Algorithms Computation Dynamic programming Iterative methods Markov decision process Operations research Optimization Policies Policy Polynomials Strongly polynomial Value iteration
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This note provides a simple example demonstrating that, if exact computations are allowed, the number of iterations required for the value iteration algorithm to find an optimal policy for discounted dynamic programming problems may grow arbitrarily quickly with the size of the problem. In particular, the number of iterations can be exponential in the number of actions. Thus, unlike policy iterations, the value iteration algorithm is not strongly polynomial for discounted dynamic programming.
ISSN:	0167-6377 1872-7468
DOI:	10.1016/j.orl.2013.12.011