Robot Subgoal-guided Navigation in Dynamic Crowded Environments with Hierarchical Deep Reinforcement Learning

Although deep reinforcement learning has recently achieved some successes in robot navigation, there are still unsolved problems. Particularly, a robot guided by a distant ultimate goal is easy to get stuck in danger or encounter collisions in dynamic crowded environments due to the lack of long-ter...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of control, automation, and systems automation, and systems, 2023-07, Vol.21 (7), p.2350-2362
Hauptverfasser:	Zhang, Tianle, Liu, Zhen, Pu, Zhiqiang, Yi, Jianqiang, Liang, Yanyan, Zhang, Du
Format:	Artikel
Sprache:	eng
Schlagworte:	Barriers Collision avoidance Control Deep learning Engineering Mechatronics Navigation Regular Papers Robotics Robots
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Although deep reinforcement learning has recently achieved some successes in robot navigation, there are still unsolved problems. Particularly, a robot guided by a distant ultimate goal is easy to get stuck in danger or encounter collisions in dynamic crowded environments due to the lack of long-term perspectives. In this paper, a novel subgoal-guided approach based on two-level hierarchical deep reinforcement learning with spatial-temporal graph attention networks (ST-GANets), called SG-HDRL, is proposed for a robot navigating in a dynamic crowded environment with autonomous obstacles, e.g., crowd. Specifically, the high-level policy, that models the spatial-temporal relation between the robot and the obstacles using the obstacles’ trajectories by the designed high-level ST-GANet, generates intermediate subgoals from a longer-term perspective over higher temporal scales. The subgoals give a favorable and collision-free direction to avoid encountering danger or collisions while approaching the ultimate goal. The low-level policy, that similarly implements the designed low-level ST-GANet to implicitly predict the obstacles’ motions, takes the subgoals as short-term guidance through an intrinsic reward incentive to generate primitive actions for the robot. Simulation results demonstrate that SG-HDRL using ST-GANets has better performances compared with state-of-the-art baselines. Furthermore, the proposed SG-HDRL is deployed to an experimental platform based on omnidirectional cars, and experiment results validate the effectiveness and practicability of the proposed SG-HDRL.
ISSN:	1598-6446 2005-4092
DOI:	10.1007/s12555-022-0171-z