Model-based Trajectory Stitching for Improved Offline Reinforcement Learning
In many real-world applications, collecting large and high-quality datasets may be too costly or impractical. Offline reinforcement learning (RL) aims to infer an optimal decision-making policy from a fixed set of data. Getting the most information from historical data is then vital for good perform...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In many real-world applications, collecting large and high-quality datasets
may be too costly or impractical. Offline reinforcement learning (RL) aims to
infer an optimal decision-making policy from a fixed set of data. Getting the
most information from historical data is then vital for good performance once
the policy is deployed. We propose a model-based data augmentation strategy,
Trajectory Stitching (TS), to improve the quality of sub-optimal historical
trajectories. TS introduces unseen actions joining previously disconnected
states: using a probabilistic notion of state reachability, it effectively
`stitches' together parts of the historical demonstrations to generate new,
higher quality ones. A stitching event consists of a transition between a pair
of observed states through a synthetic and highly probable action. New actions
are introduced only when they are expected to be beneficial, according to an
estimated state-value function. We show that using this data augmentation
strategy jointly with behavioural cloning (BC) leads to improvements over the
behaviour-cloned policy from the original dataset. Improving over the BC policy
could then be used as a launchpad for online RL through planning and
demonstration-guided RL. |
---|---|
DOI: | 10.48550/arxiv.2211.11603 |