Automatic Temperature Parameter Tuning for Reinforcement Learning Using Path Integral Policy Improvement
In this article, we propose a novel variant of path integral policy improvement with covariance matrix adaptation ( \text{PI}^{2} - \text{CMA} ), which is a reinforcement learning (RL) algorithm that aims to optimize a parameterized policy for the continuous behavior of robots. \text{PI}^{2} - \tex...
Gespeichert in:
Veröffentlicht in: | IEEE transaction on neural networks and learning systems 2023-09, Vol.PP, p.1-12 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this article, we propose a novel variant of path integral policy improvement with covariance matrix adaptation ( \text{PI}^{2} - \text{CMA} ), which is a reinforcement learning (RL) algorithm that aims to optimize a parameterized policy for the continuous behavior of robots. \text{PI}^{2} - \text{CMA} has a hyperparameter called the temperature parameter, and its value is critical for performance; however, little research has been conducted on it and the existing method still contains a tunable parameter, which can be critical to performance. Therefore, tuning by trial and error is necessary in the existing method. Moreover, we show that there is a problem setting that cannot be learned by the existing method. The proposed method solves both problems by automatically adjusting the temperature parameter for each update. We confirmed the effectiveness of the proposed method using numerical tests. |
---|---|
ISSN: | 2162-237X 2162-2388 |
DOI: | 10.1109/TNNLS.2023.3312857 |