Discrete-Time Local Value Iteration Adaptive Dynamic Programming: Convergence Analysis

In this paper, convergence properties are established for the newly developed discrete-time local value iteration adaptive dynamic programming (ADP) algorithm. The present local iterative ADP algorithm permits an arbitrary positive semidefinite function to initialize the algorithm. Employing a state...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on systems, man, and cybernetics. Systems man, and cybernetics. Systems, 2018-06, Vol.48 (6), p.875-891
Hauptverfasser:	Wei, Qinglai, Lewis, Frank L., Liu, Derong, Song, Ruizhuo, Lin, Hanquan
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptive algorithms Adaptive critic designs adaptive dynamic programming (ADP) Aerospace electronics Algorithms approximate dynamic programming Approximation algorithms Computer simulation Control theory Convergence Dynamic programming Iterative algorithms Iterative methods local iteration Machine learning neural networks neuro-dynamic programming Nonlinear systems Optimal control
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, convergence properties are established for the newly developed discrete-time local value iteration adaptive dynamic programming (ADP) algorithm. The present local iterative ADP algorithm permits an arbitrary positive semidefinite function to initialize the algorithm. Employing a state-dependent learning rate function, for the first time, the iterative value function and iterative control law can be updated in a subset of the state space instead of the whole state space, which effectively relaxes the computational burden. A new analysis method for the convergence property is developed to prove that the iterative value functions will converge to the optimum under some mild constraints. Monotonicity of the local value iteration ADP algorithm is presented, which shows that under some special conditions of the initial value function and the learning rate function, the iterative value function can monotonically converge to the optimum. Finally, three simulation examples and comparisons are given to illustrate the performance of the developed algorithm.
ISSN:	2168-2216 2168-2232
DOI:	10.1109/TSMC.2016.2623766