A parallel scheduling algorithm for reinforcement learning in large state space

The main challenge in the area of reinforcement learning is scaling up to larger and more complex problems. Aiming at the scaling problem of reinforcement learning, a scalable reinforcement learning method, DCS-SRL, is proposed on the basis of divide-and-conquer strategy, and its convergence is prov...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Frontiers of Computer Science 2012-12, Vol.6 (6), p.631-646
Hauptverfasser:	LIU, Quan, YANG, Xudong, JING, Ling, LI, Jin, LI, Jiao
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Computer Science continuous state space Convergence divide-and-conquer strategy large state space Machine learning parallel schedule Priority scheduling Research Article scalability Scheduling 分而治之可扩展性大型空间学习方法并行方法强化学习状态空间调度算法
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The main challenge in the area of reinforcement learning is scaling up to larger and more complex problems. Aiming at the scaling problem of reinforcement learning, a scalable reinforcement learning method, DCS-SRL, is proposed on the basis of divide-and-conquer strategy, and its convergence is proved. In this method, the learning problem in large state space or continuous state space is decomposed into multiple smaller subproblems. Given a specific learning algorithm, each subproblem can be solved independently with limited available resources. In the end, component solutions can be recombined to obtain the desired result. To address the question of prioritizing subproblems in the scheduler, a weighted priority scheduling algorithm is proposed. This scheduling algorithm ensures that computation is focused on regions of the problem space which are expected to be maximally productive. To expedite the learning process, a new parallel method, called DCS-SPRL, is derived from combining DCS-SRL with a parallel scheduling architecture. In the DCS-SPRL method, the subproblems will be distributed among processors that have the capacity to work in parallel. The experimental results show that learning based on DCS-SPRL has fast convergence speed and good scalability.
ISSN:	1673-7350 2095-2228 1673-7466 2095-2236
DOI:	10.1007/s11704-012-1098-y