Sparse online kernelized actor-critic Learning in reproducing kernel Hilbert space

In this paper, we develop a novel non-parametric online actor-critic reinforcement learning (RL) algorithm to solve optimal regulation problems for a class of continuous-time affine nonlinear dynamical systems. To deal with the value function approximation (VFA) with inherent nonlinear and unknown s...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Artificial intelligence review 2022, Vol.55 (1), p.23-58
Hauptverfasser:	Yang, Yongliang, Zhu, Hufei, Zhang, Qichao, Zhao, Bo, Li, Zhenning, Wunsch, Donald C.
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptive control Algorithms Analysis Artificial Intelligence Computer Science Data collection Data mining Dictionaries Distance learning Hilbert space Kernels Laws, regulations and rules Machine learning Nonlinear systems Nonparametric statistics Optimal control Optimization Variable structure control
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, we develop a novel non-parametric online actor-critic reinforcement learning (RL) algorithm to solve optimal regulation problems for a class of continuous-time affine nonlinear dynamical systems. To deal with the value function approximation (VFA) with inherent nonlinear and unknown structure, a reproducing kernel Hilbert space (RKHS)-based kernelized method is designed through online sparsification, where the dictionary size is fixed and consists of updated elements. In addition, the linear independence check condition, i.e., an online criteria, is designed to determine whether the online data should be inserted into the dictionary. The RHKS-based kernelized VFA has a variable structure in accordance with the online data collection, which is different from classical parametric VFA methods with a fixed structure. Furthermore, we develop a sparse online kernelized actor-critic learning RL method to learn the unknown optimal value function and the optimal control policy in an adaptive fashion. The convergence of the presented kernelized actor-critic learning method to the optimum is provided. The boundedness of the closed-loop signals during the online learning phase can be guaranteed. Finally, a simulation example is conducted to demonstrate the effectiveness of the presented kernelized actor-critic learning algorithm.
ISSN:	0269-2821 1573-7462
DOI:	10.1007/s10462-021-10045-9