A recurrent reinforcement learning approach applicable to highly uncertain environments
Reinforcement learning has been a promising approach in control and robotics since data-driven learning leads to non-necessity of engineering knowledge. However, it usually requires many interactions with environments to train a controller. This is a practical limitation in some real environments, f...
Gespeichert in:
Veröffentlicht in: | International Journal of Advanced Robotic Systems 2020-03, Vol.17 (2), p.172988142091625 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Reinforcement learning has been a promising approach in control and robotics since data-driven learning leads to non-necessity of engineering knowledge. However, it usually requires many interactions with environments to train a controller. This is a practical limitation in some real environments, for example, robots where interactions with environments are restricted and time inefficient. Thus, learning is generally conducted with a simulation environment, and after the learning, migration is performed to apply the learned policy to the real environment, but the differences between the simulation environment and the real environment, for example, friction coefficients at joints, changing loads, may cause undesired results on the migration. To solve this problem, most learning approaches concentrate on retraining, system or parameter identification, as well as adaptive policy training. In this article, we propose an approach where an adaptive policy is learned by extracting more information from the data. An environmental encoder, which indirectly reflects the parameters of an environment, is trained by explicitly incorporating model uncertainties into long-term planning and policy learning. This approach can identify the environment differences when migrating the learned policy to a real environment, thus increase the adaptability of the policy. Moreover, its applicability to autonomous learning in control tasks is also verified. |
---|---|
ISSN: | 1729-8806 1729-8814 1729-8814 |
DOI: | 10.1177/1729881420916258 |