Optimistic Value Iteration
Markov decision processes are widely used for planning and verification in settings that combine controllable or adversarial choices with probabilistic behaviour. The standard analysis algorithm, value iteration, only provides a lower bound on unbounded probabilities or reward values. Two "soun...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Markov decision processes are widely used for planning and verification in
settings that combine controllable or adversarial choices with probabilistic
behaviour. The standard analysis algorithm, value iteration, only provides a
lower bound on unbounded probabilities or reward values. Two "sound"
variations, which also deliver an upper bound, have recently appeared. In this
paper, we present optimistic value iteration, a new sound approach that
leverages value iteration's ability to usually deliver tight lower bounds: we
obtain a lower bound via standard value iteration, use the result to "guess" an
upper bound, and prove the latter's correctness. Optimistic value iteration is
easy to implement, does not require extra precomputations or a priori state
space transformations, and works for computing reachability probabilities as
well as expected rewards. It is also fast, as we show via an extensive
experimental evaluation using our publicly available implementation within the
Modest Toolset. |
---|---|
DOI: | 10.48550/arxiv.1910.01100 |