Parameter Space Noise for Exploration
Deep reinforcement learning (RL) methods generally engage in exploratory behavior through noise injection in the action space. An alternative is to add noise directly to the agent's parameters, which can lead to more consistent exploration and a richer set of behaviors. Methods such as evolutio...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Deep reinforcement learning (RL) methods generally engage in exploratory
behavior through noise injection in the action space. An alternative is to add
noise directly to the agent's parameters, which can lead to more consistent
exploration and a richer set of behaviors. Methods such as evolutionary
strategies use parameter perturbations, but discard all temporal structure in
the process and require significantly more samples. Combining parameter noise
with traditional RL methods allows to combine the best of both worlds. We
demonstrate that both off- and on-policy methods benefit from this approach
through experimental comparison of DQN, DDPG, and TRPO on high-dimensional
discrete action environments as well as continuous control tasks. Our results
show that RL with parameter noise learns more efficiently than traditional RL
with action space noise and evolutionary strategies individually. |
---|---|
DOI: | 10.48550/arxiv.1706.01905 |