Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model

We consider the problems of learning the optimal action-value function and the optimal policy in discounted-reward Markov decision processes (MDPs). We prove new PAC bounds on the sample-complexity of two well-known model-based reinforcement learning (RL) algorithms in the presence of a generative m...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Machine learning 2013-06, Vol.91 (3), p.325-349
Hauptverfasser:	Gheshlaghi Azar, Mohammad, Munos, Rémi, Kappen, Hilbert J.
Format:	Artikel
Sprache:	eng
Schlagworte:	Applied sciences Artificial Intelligence Complexity Computer Science Computer science control theory systems Control Decision making models Exact sciences and technology Learning Learning and adaptive systems Lower bounds Machine Learning Mathematical models Mechatronics Minimax technique Natural Language Processing (NLP) Optimization Policies Reinforcement Robotics Simulation and Modeling
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!