Acceleration of Reinforcement Learning by Controlled Use of Options Given as Prior Information

Reinforcement learning is a method with which an agent learns an appropriate action policy for solving problems by the trial-and-error. The advantage is that reinforcement learning can be applied to unknown or uncertain problems. But instead, there is a drawback that this method needs a long time to...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:SICE Journal of Control, Measurement, and System Integration Measurement, and System Integration, 2013, Vol.6(4), pp.252-258
Hauptverfasser: TERASHIMA, Kento, TAKANO, Hirotaka, MURATA, Junichi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Reinforcement learning is a method with which an agent learns an appropriate action policy for solving problems by the trial-and-error. The advantage is that reinforcement learning can be applied to unknown or uncertain problems. But instead, there is a drawback that this method needs a long time to solve the problem because of the trial-and-error. If there is prior information about the environment, some of trial-and-error can be spared and the learning can take a shorter time. The prior information can be provided in the form of options by a human designer. But the options can be wrong because of uncertainties in the problems. If the wrong options are used, there can be bad effects such as failure to get the optimal policy and slowing down of reinforcement learning. This paper proposes to control use of the options to suppress the bad effects. The agent forgets the given options gradually while it learns the better policy. The proposed method is applied to three testbed environments and two types of prior information. The method shows good results in terms of both the learning speed and the quality of obtained policies.
ISSN:1882-4889
1884-9970
DOI:10.9746/jcmsi.6.252