On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes

We present the first finite time global convergence analysis of policy gradient in the context of infinite horizon average reward Markov decision processes (MDPs). Specifically, we focus on ergodic tabular MDPs with finite state and action spaces. Our analysis shows that the policy gradient iterates...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Kumar, Navdeep, Murthy, Yashaswini, Shufaro, Itai, Levy, Kfir Y, Srikant, R, Mannor, Shie
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning Computer Science - Systems and Control
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!