Active reward learning with a novel acquisition function

Reward functions are an essential component of many robot learning methods. Defining such functions, however, remains hard in many practical applications. For tasks such as grasping, there are no reliable success measures available. Defining reward functions by hand requires extensive task knowledge...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Autonomous robots 2015-10, Vol.39 (3), p.389-405
Hauptverfasser:	Daniel, Christian, Kroemer, Oliver, Viering, Malte, Metz, Jan, Peters, Jan
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Bayesian analysis Computer Imaging Control Divergence Engineering Gaussian process Grasping (robotics) Mechatronics Optimization Pattern Recognition and Graphics Pendulums Robot learning Robotics Robotics and Automation Robots Vision
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Reward functions are an essential component of many robot learning methods. Defining such functions, however, remains hard in many practical applications. For tasks such as grasping, there are no reliable success measures available. Defining reward functions by hand requires extensive task knowledge and often leads to undesired emergent behavior. We introduce a framework, wherein the robot simultaneously learns an action policy and a model of the reward function by actively querying a human expert for ratings. We represent the reward model using a Gaussian process and evaluate several classical acquisition functions (AFs) from the Bayesian optimization literature in this context. Furthermore, we present a novel AF, expected policy divergence. We demonstrate results of our method for a robot grasping task and show that the learned reward function generalizes to a similar task. Additionally, we evaluate the proposed novel AF on a real robot pendulum swing-up task.
ISSN:	0929-5593 1573-7527
DOI:	10.1007/s10514-015-9454-z