Categorical semantics of compositional reinforcement learning
Reinforcement learning (RL) often requires decomposing a problem into subtasks and composing learned behaviors on these tasks. Compositionality in RL has the potential to create modular subtask units that interface with other system capabilities. However, generating compositional models requires the...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Reinforcement learning (RL) often requires decomposing a problem into
subtasks and composing learned behaviors on these tasks. Compositionality in RL
has the potential to create modular subtask units that interface with other
system capabilities. However, generating compositional models requires the
characterization of minimal assumptions for the robustness of the compositional
feature. We develop a framework for a \emph{compositional theory} of RL using a
categorical point of view. Given the categorical representation of
compositionality, we investigate sufficient conditions under which
learning-by-parts results in the same optimal policy as learning on the whole.
In particular, our approach introduces a category $\mathsf{MDP}$, whose objects
are Markov decision processes (MDPs) acting as models of tasks. We show that
$\mathsf{MDP}$ admits natural compositional operations, such as certain fiber
products and pushouts. These operations make explicit compositional phenomena
in RL and unify existing constructions, such as puncturing hazardous states in
composite MDPs and incorporating state-action symmetry. We also model
sequential task completion by introducing the language of zig-zag diagrams that
is an immediate application of the pushout operation in $\mathsf{MDP}$. |
---|---|
DOI: | 10.48550/arxiv.2208.13687 |