Reinforcement Learning for Penalty Avoidance in Continuous State Spaces

Reinforcement learning involves learning to adapt to environments through the presentation of rewards – special input &#8211 serving as clues. To obtain quick rational policies, profit sharing (PS) [6], rational policy making algorithm (RPM) [7], penalty avoiding rational policy making algorithm...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of advanced computational intelligence and intelligent informatics 2007-07, Vol.11 (6), p.668-676
Hauptverfasser: Miyazaki, Kazuteru, Kobayashi, Shigenobu
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Reinforcement learning involves learning to adapt to environments through the presentation of rewards – special input &#8211 serving as clues. To obtain quick rational policies, profit sharing (PS) [6], rational policy making algorithm (RPM) [7], penalty avoiding rational policy making algorithm (PARP) [8], and PS-r* [9] are used. They are called PS-based methods. When applying reinforcement learning to actual problems, treatment of continuous-valued input is sometimes required. A method [10] based on RPM is proposed as a PS-based method corresponding to the continuous-valued input, but only rewards exist and penalties cannot be suitably handled. We studied the treatment of continuous-valued input suitable for a PS-based method in which the environment includes both rewards and penalties. Specifically, we propose having PARP correspond to continuous-valued input while simultaneously targeting the attainment of rewards and avoiding penalties. We applied our proposal to the pole-cart balancing problem and confirmed its validity.
ISSN:1343-0130
1883-8014
DOI:10.20965/jaciii.2007.p0668