Constrained Undiscounted Stochastic Dynamic Programming

In this paper we investigate the computation of optimal policies in constrained discrete stochastic dynamic programming with the average reward as utility function. The state-space and action-sets are assumed to be finite. Constraints which are linear functions of the state-action frequencies are al...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Mathematics of operations research 1984-05, Vol.9 (2), p.276-289
Hauptverfasser: Hordijk, A, Kallenberg, L. C. M
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this paper we investigate the computation of optimal policies in constrained discrete stochastic dynamic programming with the average reward as utility function. The state-space and action-sets are assumed to be finite. Constraints which are linear functions of the state-action frequencies are allowed. In the general multichain case, an optimal policy will be a randomized nonstationary policy. An algorithm to compute such an optimal policy is presented. Furthermore, sufficient conditions for optimal policies to be stationary are derived. There are many applications for constrained undiscounted stochastic dynamic programming, e.g., in multiple objective Markovian decision models.
ISSN:0364-765X
1526-5471
DOI:10.1287/moor.9.2.276