Tree Search-Based Policy Optimization under Stochastic Execution Delay
The standard formulation of Markov decision processes (MDPs) assumes that the agent's decisions are executed immediately. However, in numerous realistic applications such as robotics or healthcare, actions are performed with a delay whose value can even be stochastic. In this work, we introduce...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The standard formulation of Markov decision processes (MDPs) assumes that the
agent's decisions are executed immediately. However, in numerous realistic
applications such as robotics or healthcare, actions are performed with a delay
whose value can even be stochastic. In this work, we introduce stochastic
delayed execution MDPs, a new formalism addressing random delays without
resorting to state augmentation. We show that given observed delay values, it
is sufficient to perform a policy search in the class of Markov policies in
order to reach optimal performance, thus extending the deterministic fixed
delay case. Armed with this insight, we devise DEZ, a model-based algorithm
that optimizes over the class of Markov policies. DEZ leverages Monte-Carlo
tree search similar to its non-delayed variant EfficientZero to accurately
infer future states from the action queue. Thus, it handles delayed execution
while preserving the sample efficiency of EfficientZero. Through a series of
experiments on the Atari suite, we demonstrate that although the previous
baseline outperforms the naive method in scenarios with constant delay, it
underperforms in the face of stochastic delays. In contrast, our approach
significantly outperforms the baselines, for both constant and stochastic
delays. The code is available at http://github.com/davidva1/Delayed-EZ . |
---|---|
DOI: | 10.48550/arxiv.2404.05440 |