Domain Randomization via Entropy Maximization
Varying dynamics parameters in simulation is a popular Domain Randomization (DR) approach for overcoming the reality gap in Reinforcement Learning (RL). Nevertheless, DR heavily hinges on the choice of the sampling distribution of the dynamics parameters, since high variability is crucial to regular...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Varying dynamics parameters in simulation is a popular Domain Randomization
(DR) approach for overcoming the reality gap in Reinforcement Learning (RL).
Nevertheless, DR heavily hinges on the choice of the sampling distribution of
the dynamics parameters, since high variability is crucial to regularize the
agent's behavior but notoriously leads to overly conservative policies when
randomizing excessively. In this paper, we propose a novel approach to address
sim-to-real transfer, which automatically shapes dynamics distributions during
training in simulation without requiring real-world data. We introduce DOmain
RAndomization via Entropy MaximizatiON (DORAEMON), a constrained optimization
problem that directly maximizes the entropy of the training distribution while
retaining generalization capabilities. In achieving this, DORAEMON gradually
increases the diversity of sampled dynamics parameters as long as the
probability of success of the current policy is sufficiently high. We
empirically validate the consistent benefits of DORAEMON in obtaining highly
adaptive and generalizable policies, i.e. solving the task at hand across the
widest range of dynamics parameters, as opposed to representative baselines
from the DR literature. Notably, we also demonstrate the Sim2Real applicability
of DORAEMON through its successful zero-shot transfer in a robotic manipulation
setup under unknown real-world parameters. |
---|---|
DOI: | 10.48550/arxiv.2311.01885 |