GUBS criterion: Arbitrary trade-offs between cost and probability-to-goal in stochastic planning based on Expected Utility Theory
Stochastic Shortest Path MDPs (SSP-MDPs) are used to model probabilistic sequential decision problems where the objective is to minimize the expected accumulated cost to goal. However, in the presence of dead-ends, the conventional criterion for SSP-MDPs, which minimizes the expected accumulated cos...
Gespeichert in:
Veröffentlicht in: | Artificial intelligence 2023-03, Vol.316, p.103848, Article 103848 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Stochastic Shortest Path MDPs (SSP-MDPs) are used to model probabilistic sequential decision problems where the objective is to minimize the expected accumulated cost to goal. However, in the presence of dead-ends, the conventional criterion for SSP-MDPs, which minimizes the expected accumulated cost, can become ill-defined. Lexicographic criteria can solve this by preferring policies that reach the goal with the highest possible probability. Other criteria can instead make a trade-off between some cost measure and probability-to-goal. However, both of these approaches can lead to policies that might not represent the choice of a real decision-maker. In this work, we propose the GUBS criterion to address these problems. GUBS combines goal prioritization over histories with Expected Utility Theory and is the only criterion between all criteria analyzed that not only allows for a trade-off between a large accumulated cost and a small loss in probability-to-goal, but also guarantees arbitrary trade-offs that can be tuned from its parameters without previous knowledge of the problem being solved. We also propose eGUBS, which is a particular case of GUBS when the exponential utility function is used, and two algorithms for optimally solving these problems: eGUBS-VI, a VI-based algorithm; and eGUBS-AO*, a heuristic search algorithm. Results indicate that, when there is a good heuristic function available or when the state space is too large, eGUBS-AO* can perform better than eGUBS-VI by doing an efficient search. In other cases, eGUBS-VI's simpler approach might have better results. |
---|---|
ISSN: | 0004-3702 1872-7921 |
DOI: | 10.1016/j.artint.2022.103848 |