An adjustment method of the number of states on Q-learning segmenting state space adaptively

The results of imposing limitations on the number of states and of promoting the splitting of states in Q‐learning are presented. Q‐learning is a common reinforcement learning method in which the learning agent autonomously segments the environment states. In situations where the designer of an agen...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Electronics & Communications in Japan. Part 2, Electronics Electronics, 2007-09, Vol.90 (9), p.75-86
Hauptverfasser:	Hamagami, Tomoki, Koakutsu, Seiichi, Hirata, Hironori
Format:	Artikel
Sprache:	eng
Schlagworte:	adaptive state segmentation Q-learning QLASS reinforcement learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The results of imposing limitations on the number of states and of promoting the splitting of states in Q‐learning are presented. Q‐learning is a common reinforcement learning method in which the learning agent autonomously segments the environment states. In situations where the designer of an agent is unable to explicitly provide the agent with the boundaries of states in the environment in which the agent is acting, the agent needs to simultaneously learn while autonomously determining the internal discrete states that are needed in order to take the appropriate actions. A simple method of segmenting states based on a reinforcement signal (QLASS) has been proposed for this purpose. However, the original method suffers from the problem that the number of states grows excessively large as learning proceeds. A method is therefore proposed that defines temperature and eligibility attributes for each of the internal discrete states of the agent, and that limits and adds to the number of internal discrete states, and promotes random actions depending on the values of these attributes. The results of applying the proposed method to a number of tasks, including tasks that incorporate a dynamic environment, are compared to the QLASS method when only the reinforcement signal is used, and a similar level of learning results is found to be achieved using a fewer number of states. Furthermore, it is found that tasks are able to be completed in a small number of steps even when only a small number of trials are used for learning. © 2007 Wiley Periodicals, Inc. Electron Comm Jpn Pt 2, 90(9): 75– 86, 2007; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecjb.20383
ISSN:	8756-663X 1520-6432 0915-1893
DOI:	10.1002/ecjb.20383