Uniformly Conservative Exploration in Reinforcement Learning

A key challenge to deploying reinforcement learning in practice is avoiding excessive (harmful) exploration in individual episodes. We propose a natural constraint on exploration -- \textit{uniformly} outperforming a conservative policy (adaptively estimated from all data observed thus far), up to a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Xu, Wanqiao, Ma, Jason Yecheng, Xu, Kan, Bastani, Hamsa, Bastani, Osbert
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning Statistics - Machine Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!