Reinforcement Learning with Energy-Exchange Dynamics for Spring-loaded Biped Robot Walking

This paper presents a probabilistic Model-based Reinforcement Learning (MBRL) approach for learning the Energy-exchange Dynamics (EED) of a spring-loaded biped robot. Our approach enables on-site walking acquisition with high sample efficiency, real-time planning capability, and generalizability acr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE robotics and automation letters 2023-10, Vol.8 (10), p.1-8
Hauptverfasser:	Kuo, Cheng-Yu, Shin, Hirofumi, Matsubara, Takamitsu
Format:	Artikel
Sprache:	eng
Schlagworte:	Actuators Energy Energy conservation law Exchanging Hardware Humanoid and Bipedal Locomotion Legged locomotion Model Learning for Control Onsite Planning Probabilistic models Real time Reinforcement Learning Robot dynamics Robots Task analysis Training Trajectory Walking
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper presents a probabilistic Model-based Reinforcement Learning (MBRL) approach for learning the Energy-exchange Dynamics (EED) of a spring-loaded biped robot. Our approach enables on-site walking acquisition with high sample efficiency, real-time planning capability, and generalizability across skill conditions. Specifically, we learn the data-driven state transition dynamics of the robot in the formulation of energy-states, with their interaction characterized as energy-exchange to reduce dimensionality. To improve planning reliability with the learned EED, we design a control space based on a walking trajectory that follows the law of conservation of energy and is formulated by energy-states. We evaluated our approach using a four-degree-of-freedom spring-loaded biped robot in simulation and hardware, and generalizability is validated by using the same learning framework for different walking speeds and terrains in simulation and walking acquisition with hardware. All results showed successful on-site walking acquisition with a compact nine-dimension dynamics model, 40Hz real-time planning, and on-site learning within a few minutes.
ISSN:	2377-3766 2377-3766
DOI:	10.1109/LRA.2023.3303786