A Mapless Local Path Planning Approach Using Deep Reinforcement Learning Framework

The key module for autonomous mobile robots is path planning and obstacle avoidance. Global path planning based on known maps has been effectively achieved. Local path planning in unknown dynamic environments is still very challenging due to the lack of detailed environmental information and unpredi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Sensors (Basel, Switzerland) Switzerland), 2023-02, Vol.23 (4), p.2036
Hauptverfasser:	Yin, Yan, Chen, Zhiyu, Liu, Gang, Guo, Jianwei
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial intelligence auxiliary reward functions Computer simulation D3QN Data acquisition Deep learning Experiments exploration-exploitation Localization n-step Neural networks Obstacle avoidance Path planning Planning Remote sensing Robot dynamics Robots turtlebot3
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The key module for autonomous mobile robots is path planning and obstacle avoidance. Global path planning based on known maps has been effectively achieved. Local path planning in unknown dynamic environments is still very challenging due to the lack of detailed environmental information and unpredictability. This paper proposes an end-to-end local path planner n-step dueling double DQN with reward-based ϵ-greedy (RND3QN) based on a deep reinforcement learning framework, which acquires environmental data from LiDAR as input and uses a neural network to fit Q-values to output the corresponding discrete actions. The bias is reduced using n-step bootstrapping based on deep Q-network (DQN). The ϵ-greedy exploration-exploitation strategy is improved with the reward value as a measure of exploration, and an auxiliary reward function is introduced to increase the reward distribution of the sparse reward environment. Simulation experiments are conducted on the gazebo to test the algorithm's effectiveness. The experimental data demonstrate that the average total reward value of RND3QN is higher than that of algorithms such as dueling double DQN (D3QN), and the success rates are increased by 174%, 65%, and 61% over D3QN on three stages, respectively. We experimented on the turtlebot3 waffle pi robot, and the strategies learned from the simulation can be effectively transferred to the real robot.
ISSN:	1424-8220 1424-8220
DOI:	10.3390/s23042036