Temporal Difference Learning with Piecewise Linear Basis

Temporal difference （TD） learning fam- ily tries to learn a least-squares solution of an approxi- mate Linear value function （LVF） to deal with large scale and/or continuous reinforcement learning problems. How- ever, due to the represented ability of the features in LVF, the predictive error of the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	电子学报：英文版 2014, Vol.23 (1), p.49-54
1. Verfasser:	CHEN Xingguo GAO Yang FAN Shunguo
Format:	Artikel
Sprache:	eng
Schlagworte:	GTD 分段线性基础学习问题时间最优值函数最小二乘解误差范围
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Temporal difference （TD） learning fam- ily tries to learn a least-squares solution of an approxi- mate Linear value function （LVF） to deal with large scale and/or continuous reinforcement learning problems. How- ever, due to the represented ability of the features in LVF, the predictive error of the learned LVF is bounded by the residual between the optimal value function and the pro- jected optimal value function. In this paper, Temporal difference learning with Piecewise linear basis （PLB-TD） is proposed to further decrease the error bounds. In PLB- TD, there are two steps：（1） build the piecewise linear basis for problems with different dimensions; （2） learn the pa- rameters via some famous members from the TD learning family （linear TD, GTD, GTD2 or TDC）, which complex- ity is O（n）. The error bounds are proved to decrease to zero when the size of the piecewise basis goes into infinite. The empirical results demonstrate the effectiveness of the proposed algorithm.
ISSN:	1022-4653