Continuous Value Assignment: A Doubly Robust Data Augmentation for Off-Policy Learning
Deep reinforcement learning (RL) has witnessed remarkable success in a wide range of control tasks. To overcome RL's notorious sample inefficiency, prior studies have explored data augmentation techniques leveraging collected transition data. However, these methods face challenges in synthesizi...
Gespeichert in:
Veröffentlicht in: | IEEE transaction on neural networks and learning systems 2024-09, Vol.PP, p.1-13 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Deep reinforcement learning (RL) has witnessed remarkable success in a wide range of control tasks. To overcome RL's notorious sample inefficiency, prior studies have explored data augmentation techniques leveraging collected transition data. However, these methods face challenges in synthesizing transitions adhering to the authentic environment dynamics, especially when the transition is high-dimensional and includes many redundant/irrelevant features to the task. In this article, we introduce continuous value assignment (CVA), an innovative optimization-level data augmentation approach that directly synthesizes novel training data in the state-action value space, effectively bypassing the need for explicit transition modeling. The key intuition of our method is that the transition plays an intermediate role in calculating the state-action value during optimization, and therefore directly augmenting the state-action value is more causally related to the optimization process. Specifically, our CVA combines parameterized value prediction and nonparametric value interpolation from neighboring states, resulting in doubly robust target values w.r.t. novel states and actions. Extensive experiments demonstrate CVA's substantial improvements in sample efficiency across complex continuous control tasks, surpassing several advanced baselines. |
---|---|
ISSN: | 2162-237X 2162-2388 2162-2388 |
DOI: | 10.1109/TNNLS.2024.3435406 |