Goal Representation Heuristic Dynamic Programming on Maze Navigation

Goal representation heuristic dynamic programming (GrHDP) is proposed in this paper to demonstrate online learning in the Markov decision process. In addition to the (external) reinforcement signal in literature, we develop an adaptively internal goal/reward representation for the agent with the pro...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transaction on neural networks and learning systems 2013-12, Vol.24 (12), p.2038-2050
Hauptverfasser:	Ni, Zhen, He, Haibo, Wen, Jinyu, Xu, Xin
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptive dynamic programming Algorithmics. Computability. Computer arithmetics Algorithms Applied sciences Artificial intelligence Benchmark testing Computer science control theory systems Control theory. Systems Convergence Distance learning Dynamic programming Equations Exact sciences and technology goal representation heuristic dynamic programming Heuristic Learning Learning and adaptive systems Markov analysis Markov decision process Mathematical model maze navigation/path planning Navigation Networks Neural networks reinforcement learning Representations Robotics Theoretical computing
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Goal representation heuristic dynamic programming (GrHDP) is proposed in this paper to demonstrate online learning in the Markov decision process. In addition to the (external) reinforcement signal in literature, we develop an adaptively internal goal/reward representation for the agent with the proposed goal network. Specifically, we keep the actor-critic design in heuristic dynamic programming (HDP) and include a goal network to represent the internal goal signal, to further help the value function approximation. We evaluate our proposed GrHDP algorithm on two 2-D maze navigation problems, and later on one 3-D maze navigation problem. Compared to the traditional HDP approach, the learning performance of the agent is improved with our proposed GrHDP approach. In addition, we also include the learning performance with two other reinforcement learning algorithms, namely Sarsa(λ) and Q-learning, on the same benchmarks for comparison. Furthermore, in order to demonstrate the theoretical guarantee of our proposed method, we provide the characteristics analysis toward the convergence of weights in neural networks in our GrHDP approach.
ISSN:	2162-237X 2162-2388
DOI:	10.1109/TNNLS.2013.2271454