Automatic Temperature Parameter Tuning for Reinforcement Learning Using Path Integral Policy Improvement

In this article, we propose a novel variant of path integral policy improvement with covariance matrix adaptation ( \text{PI}^{2} - \text{CMA} ), which is a reinforcement learning (RL) algorithm that aims to optimize a parameterized policy for the continuous behavior of robots. \text{PI}^{2} - \tex...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transaction on neural networks and learning systems 2023-09, Vol.PP, p.1-12
Hauptverfasser: Nakano, Hiroyasu, Ariizumi, Ryo, Asai, Toru, Azuma, Shun-Ichi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this article, we propose a novel variant of path integral policy improvement with covariance matrix adaptation ( \text{PI}^{2} - \text{CMA} ), which is a reinforcement learning (RL) algorithm that aims to optimize a parameterized policy for the continuous behavior of robots. \text{PI}^{2} - \text{CMA} has a hyperparameter called the temperature parameter, and its value is critical for performance; however, little research has been conducted on it and the existing method still contains a tunable parameter, which can be critical to performance. Therefore, tuning by trial and error is necessary in the existing method. Moreover, we show that there is a problem setting that cannot be learned by the existing method. The proposed method solves both problems by automatically adjusting the temperature parameter for each update. We confirmed the effectiveness of the proposed method using numerical tests.
ISSN:2162-237X
2162-2388
DOI:10.1109/TNNLS.2023.3312857