Automatic Temperature Parameter Tuning for Reinforcement Learning Using Path Integral Policy Improvement

In this article, we propose a novel variant of path integral policy improvement with covariance matrix adaptation ( \text{PI}^{2} - \text{CMA} ), which is a reinforcement learning (RL) algorithm that aims to optimize a parameterized policy for the continuous behavior of robots. \text{PI}^{2} - \tex...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transaction on neural networks and learning systems 2023-09, Vol.PP, p.1-12
Hauptverfasser:	Nakano, Hiroyasu, Ariizumi, Ryo, Asai, Toru, Azuma, Shun-Ichi
Format:	Artikel
Sprache:	eng
Schlagworte:	Covariance matrices Learning systems Legged locomotion Legged robot policy improvement reinforcement learning (RL) robotics Robots snake robot Task analysis Trajectory Tuning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this article, we propose a novel variant of path integral policy improvement with covariance matrix adaptation ( \text{PI}^{2} - \text{CMA} ), which is a reinforcement learning (RL) algorithm that aims to optimize a parameterized policy for the continuous behavior of robots. \text{PI}^{2} - \text{CMA} has a hyperparameter called the temperature parameter, and its value is critical for performance; however, little research has been conducted on it and the existing method still contains a tunable parameter, which can be critical to performance. Therefore, tuning by trial and error is necessary in the existing method. Moreover, we show that there is a problem setting that cannot be learned by the existing method. The proposed method solves both problems by automatically adjusting the temperature parameter for each update. We confirmed the effectiveness of the proposed method using numerical tests.
ISSN:	2162-237X 2162-2388
DOI:	10.1109/TNNLS.2023.3312857