Exact Reduction of Huge Action Spaces in General Reinforcement Learning
The reinforcement learning (RL) framework formalizes the notion of learning with interactions. Many real-world problems have large state-spaces and/or action-spaces such as in Go, StarCraft, protein folding, and robotics or are non-Markovian, which cause significant challenges to RL algorithms. In t...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The reinforcement learning (RL) framework formalizes the notion of learning
with interactions. Many real-world problems have large state-spaces and/or
action-spaces such as in Go, StarCraft, protein folding, and robotics or are
non-Markovian, which cause significant challenges to RL algorithms. In this
work we address the large action-space problem by sequentializing actions,
which can reduce the action-space size significantly, even down to two actions
at the expense of an increased planning horizon. We provide explicit and exact
constructions and equivalence proofs for all quantities of interest for
arbitrary history-based processes. In the case of MDPs, this could help RL
algorithms that bootstrap. In this work we show how action-binarization in the
non-MDP case can significantly improve Extreme State Aggregation (ESA) bounds.
ESA allows casting any (non-MDP, non-ergodic, history-based) RL problem into a
fixed-sized non-Markovian state-space with the help of a surrogate Markovian
process. On the upside, ESA enjoys similar optimality guarantees as Markovian
models do. But a downside is that the size of the aggregated state-space
becomes exponential in the size of the action-space. In this work, we patch
this issue by binarizing the action-space. We provide an upper bound on the
number of states of this binarized ESA that is logarithmic in the original
action-space size, a double-exponential improvement. |
---|---|
DOI: | 10.48550/arxiv.2012.10200 |