Temporal Difference Learning with Piecewise Linear Basis
Temporal difference (TD) learning fam- ily tries to learn a least-squares solution of an approxi- mate Linear value function (LVF) to deal with large scale and/or continuous reinforcement learning problems. How- ever, due to the represented ability of the features in LVF, the predictive error of the...
Gespeichert in:
Veröffentlicht in: | 电子学报:英文版 2014, Vol.23 (1), p.49-54 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Temporal difference (TD) learning fam- ily tries to learn a least-squares solution of an approxi- mate Linear value function (LVF) to deal with large scale and/or continuous reinforcement learning problems. How- ever, due to the represented ability of the features in LVF, the predictive error of the learned LVF is bounded by the residual between the optimal value function and the pro- jected optimal value function. In this paper, Temporal difference learning with Piecewise linear basis (PLB-TD) is proposed to further decrease the error bounds. In PLB- TD, there are two steps: (1) build the piecewise linear basis for problems with different dimensions; (2) learn the pa- rameters via some famous members from the TD learning family (linear TD, GTD, GTD2 or TDC), which complex- ity is O(n). The error bounds are proved to decrease to zero when the size of the piecewise basis goes into infinite. The empirical results demonstrate the effectiveness of the proposed algorithm. |
---|---|
ISSN: | 1022-4653 |