Path Integral Policy Improvement With Population Adaptation

Path integral policy improvement (PI 2 ) is known to be an efficient reinforcement learning algorithm, particularly, if the target system is a high-dimensional dynamical system. However, PI 2 , and its existing extensions, have adjustable parameters, on which the efficiency depends significantly. Th...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on cybernetics 2022-01, Vol.52 (1), p.312-322
Hauptverfasser:	Yamamoto, Kosuke, Ariizumi, Ryo, Hayakawa, Tomohiro, Matsuno, Fumitoshi
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Automation Covariance matrices Dynamical systems Evolution strategy Heuristic algorithms legged robot Machine learning Motion Optimization Parameters Policy reinforcement learning (RL) Reinforcement, Psychology Robotics Robots Sociology Task analysis Trajectory
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Path integral policy improvement (PI 2 ) is known to be an efficient reinforcement learning algorithm, particularly, if the target system is a high-dimensional dynamical system. However, PI 2 , and its existing extensions, have adjustable parameters, on which the efficiency depends significantly. This article proposes an extension of PI 2 that adjusts all of the critical parameters automatically. Motion acquisition tasks for three different types of simulated legged robots were performed to test the efficacy of the proposed algorithm. The results show that the proposed method cannot only eliminate the burden on the user to set the parameters appropriately but also improve the optimization performance significantly. For one of the acquired motions, a real robot experiment was conducted to show the validity of the motion.
ISSN:	2168-2267 2168-2275
DOI:	10.1109/TCYB.2020.2983923