Reinforcement Learning for Penalty Avoidance in Continuous State Spaces

Reinforcement learning involves learning to adapt to environments through the presentation of rewards – special input &#8211 serving as clues. To obtain quick rational policies, profit sharing (PS) [6], rational policy making algorithm (RPM) [7], penalty avoiding rational policy making algorithm...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of advanced computational intelligence and intelligent informatics 2007-07, Vol.11 (6), p.668-676
Hauptverfasser:	Miyazaki, Kazuteru, Kobayashi, Shigenobu
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Reinforcement learning involves learning to adapt to environments through the presentation of rewards – special input &#8211 serving as clues. To obtain quick rational policies, profit sharing (PS) [6], rational policy making algorithm (RPM) [7], penalty avoiding rational policy making algorithm (PARP) [8], and PS-r* [9] are used. They are called PS-based methods. When applying reinforcement learning to actual problems, treatment of continuous-valued input is sometimes required. A method [10] based on RPM is proposed as a PS-based method corresponding to the continuous-valued input, but only rewards exist and penalties cannot be suitably handled. We studied the treatment of continuous-valued input suitable for a PS-based method in which the environment includes both rewards and penalties. Specifically, we propose having PARP correspond to continuous-valued input while simultaneously targeting the attainment of rewards and avoiding penalties. We applied our proposal to the pole-cart balancing problem and confirmed its validity.
ISSN:	1343-0130 1883-8014
DOI:	10.20965/jaciii.2007.p0668