Learning to balance an NAO robot using reinforcement learning with symbolic inverse kinematic

An autonomous humanoid robot (HR) with learning and control algorithms is able to balance itself during sitting down, standing up, walking and running operations, as humans do. In this study, reinforcement learning (RL) with a complete symbolic inverse kinematic (IK) solution is developed to balance...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Transactions of the Institute of Measurement and Control 2017-11, Vol.39 (11), p.1735-1748
Hauptverfasser:	Tutsoy, Onder, Erol Barkana, Duygun, Colak, Sule
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer simulation Control algorithms Humanoid Kinematics Legs Machine learning Robots Stability Three dimensional bodies Walking
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	An autonomous humanoid robot (HR) with learning and control algorithms is able to balance itself during sitting down, standing up, walking and running operations, as humans do. In this study, reinforcement learning (RL) with a complete symbolic inverse kinematic (IK) solution is developed to balance the full lower body of a three-dimensional (3D) NAO HR which has 12 degrees of freedom. The IK solution converts the lower body trajectories, which are learned by RL, into reference positions for the joints of the NAO robot. This reduces the dimensionality of the learning and control problems since the IK integrated with the RL eliminates the need to use whole HR states. The IK solution in 3D space takes into account not only the legs but also the full lower body; hence, it is possible to incorporate the effect of the foot and hip lengths on the IK solution. The accuracy and capability of following real joint states are evaluated in the simulation environment. MapleSim is used to model the full lower body, and the developed RL is combined with this model by utilizing Modelica and Maple software properties. The results of the simulation show that the value function is maximized, temporal difference error is reduced to zero, the lower body is stabilized at the upright, and the convergence speed of the RL is improved with use of the symbolic IK solution.
ISSN:	0142-3312 1477-0369
DOI:	10.1177/0142331216645176