Model reference output feedback control using episodic natural actor-critic

In this paper, we develop a novel reinforcement learning algorithm which requires only system output and converges to an optimal output feedback control policy with expected dynamic performance. An informative reward function based on reference model is adopted to intuitively represent the desired c...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zhou Fang, Chuanchuan Hao, Ping Li
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Aerodynamics Approximation algorithms Educational institutions Heuristic algorithms Learning Output feedback Stochastic processes
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, we develop a novel reinforcement learning algorithm which requires only system output and converges to an optimal output feedback control policy with expected dynamic performance. An informative reward function based on reference model is adopted to intuitively represent the desired closed-loop performance, which significantly reduces the difficulty of reward construction. A stochastic output feedback control policy based on PID law is used to release the complete observability requirement. The episodic Natural Actor-Critic (eNAC) algorithm is used for policy search. Simulations on a second-order unstable system and a nonlinear LPV model of UAV's longitudinal dynamics demonstrate the efficiency of the proposed algorithm.
ISSN:	2163-5137
DOI:	10.1109/ISIE.2012.6237275