Learning motions from demonstrations and rewards with time-invariant dynamical systems based policies

An important challenge when using reinforcement learning for learning motions in robotics is the choice of parameterization for the policy. We use Gaussian Mixture Regression to extract a parameterization with relevant non-linear features from a set of demonstrations of a motion following the paradi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Autonomous robots 2018, Vol.42 (1), p.45-64
Hauptverfasser:	Rey, Joel, Kronander, Klas, Farshidian, Farbod, Buchli, Jonas, Billard, Aude
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Computer Imaging Control Dynamical systems Engineering Exploration Feature extraction Invariants Machine learning Mathematical models Mechatronics Motion stability Parameterization Parameters Pattern Recognition and Graphics Robotics Robotics and Automation Search algorithms Stiffness Vision
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	An important challenge when using reinforcement learning for learning motions in robotics is the choice of parameterization for the policy. We use Gaussian Mixture Regression to extract a parameterization with relevant non-linear features from a set of demonstrations of a motion following the paradigm of learning from demonstration. The resulting parameterization takes the form of a non-linear time-invariant dynamical system (DS). We use this time-invariant DS as a parameterized policy for a variant of the PI 2 policy search algorithm. This paper contributes by adapting PI 2 for our time-invariant motion representation. We introduce two novel parameter exploration schemes that can be used to (1) sample model parameters to achieve a uniform exploration in state space and (2) explore while ensuring stability of the resulting motion model. Additionally, a state dependent stiffness profile is learned simultaneously to the reference trajectory and both are used together in a variable impedance control architecture. This learning architecture is validated in a hardware experiment consisting of a digging task using a KUKA LWR platform.
ISSN:	0929-5593 1573-7527
DOI:	10.1007/s10514-017-9636-y