MoCoDA: Model-based Counterfactual Data Augmentation
The number of states in a dynamic process is exponential in the number of objects, making reinforcement learning (RL) difficult in complex, multi-object domains. For agents to scale to the real world, they will need to react to and reason about unseen combinations of objects. We argue that the abili...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The number of states in a dynamic process is exponential in the number of
objects, making reinforcement learning (RL) difficult in complex, multi-object
domains. For agents to scale to the real world, they will need to react to and
reason about unseen combinations of objects. We argue that the ability to
recognize and use local factorization in transition dynamics is a key element
in unlocking the power of multi-object reasoning. To this end, we show that (1)
known local structure in the environment transitions is sufficient for an
exponential reduction in the sample complexity of training a dynamics model,
and (2) a locally factored dynamics model provably generalizes
out-of-distribution to unseen states and actions. Knowing the local structure
also allows us to predict which unseen states and actions this dynamics model
will generalize to. We propose to leverage these observations in a novel
Model-based Counterfactual Data Augmentation (MoCoDA) framework. MoCoDA applies
a learned locally factored dynamics model to an augmented distribution of
states and actions to generate counterfactual transitions for RL. MoCoDA works
with a broader set of local structures than prior work and allows for direct
control over the augmented training distribution. We show that MoCoDA enables
RL agents to learn policies that generalize to unseen states and actions. We
use MoCoDA to train an offline RL agent to solve an out-of-distribution
robotics manipulation task on which standard offline RL algorithms fail. |
---|---|
DOI: | 10.48550/arxiv.2210.11287 |