Learning Action-based Representations Using Invariance
Robust reinforcement learning agents using high-dimensional observations must be able to identify relevant state features amidst many exogeneous distractors. A representation that captures controllability identifies these state elements by determining what affects agent control. While methods such a...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Robust reinforcement learning agents using high-dimensional observations must
be able to identify relevant state features amidst many exogeneous distractors.
A representation that captures controllability identifies these state elements
by determining what affects agent control. While methods such as inverse
dynamics and mutual information capture controllability for a limited number of
timesteps, capturing long-horizon elements remains a challenging problem.
Myopic controllability can capture the moment right before an agent crashes
into a wall, but not the control-relevance of the wall while the agent is still
some distance away. To address this we introduce action-bisimulation encoding,
a method inspired by the bisimulation invariance pseudometric, that extends
single-step controllability with a recursive invariance constraint. By doing
this, action-bisimulation learns a multi-step controllability metric that
smoothly discounts distant state features that are relevant for control. We
demonstrate that action-bisimulation pretraining on reward-free, uniformly
random data improves sample efficiency in several environments, including a
photorealistic 3D simulation domain, Habitat. Additionally, we provide
theoretical analysis and qualitative results demonstrating the information
captured by action-bisimulation. |
---|---|
DOI: | 10.48550/arxiv.2403.16369 |