Template Model Inspired Task Space Learning for Robust Bipedal Locomotion
This work presents a hierarchical framework for bipedal locomotion that combines a Reinforcement Learning (RL)-based high-level (HL) planner policy for the online generation of task space commands with a model-based low-level (LL) controller to track the desired task space trajectories. Different fr...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This work presents a hierarchical framework for bipedal locomotion that
combines a Reinforcement Learning (RL)-based high-level (HL) planner policy for
the online generation of task space commands with a model-based low-level (LL)
controller to track the desired task space trajectories. Different from
traditional end-to-end learning approaches, our HL policy takes insights from
the angular momentum-based linear inverted pendulum (ALIP) to carefully design
the observation and action spaces of the Markov Decision Process (MDP). This
simple yet effective design creates an insightful mapping between a
low-dimensional state that effectively captures the complex dynamics of bipedal
locomotion and a set of task space outputs that shape the walking gait of the
robot. The HL policy is agnostic to the task space LL controller, which
increases the flexibility of the design and generalization of the framework to
other bipedal robots. This hierarchical design results in a learning-based
framework with improved performance, data efficiency, and robustness compared
with the ALIP model-based approach and state-of-the-art learning-based
frameworks for bipedal locomotion. The proposed hierarchical controller is
tested in three different robots, Rabbit, a five-link underactuated planar
biped; Walker2D, a seven-link fully-actuated planar biped; and Digit, a 3D
humanoid robot with 20 actuated joints. The trained policy naturally learns
human-like locomotion behaviors and is able to effectively track a wide range
of walking speeds while preserving the robustness and stability of the walking
gait even under adversarial conditions. |
---|---|
DOI: | 10.48550/arxiv.2309.15442 |