Regular Decision Processes for Grid Worlds
Markov decision processes are typically used for sequential decision making under uncertainty. For many aspects however, ranging from constrained or safe specifications to various kinds of temporal (non-Markovian) dependencies in task and reward structures, extensions are needed. To that end, in rec...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Markov decision processes are typically used for sequential decision making
under uncertainty. For many aspects however, ranging from constrained or safe
specifications to various kinds of temporal (non-Markovian) dependencies in
task and reward structures, extensions are needed. To that end, in recent years
interest has grown into combinations of reinforcement learning and temporal
logic, that is, combinations of flexible behavior learning methods with robust
verification and guarantees. In this paper we describe an experimental
investigation of the recently introduced regular decision processes that
support both non-Markovian reward functions as well as transition functions. In
particular, we provide a tool chain for regular decision processes, algorithmic
extensions relating to online, incremental learning, an empirical evaluation of
model-free and model-based solution algorithms, and applications in regular,
but non-Markovian, grid worlds. |
---|---|
DOI: | 10.48550/arxiv.2111.03647 |