Learn to Navigate Autonomously Through Deep Reinforcement Learning

In this article, we propose a deep reinforcement learning (DRL) algorithm as well as a novel tailored neural network architecture for mobile robots to learn navigation policies autonomously. We first introduce a new feature extractor to better acquire critical spatiotemporal features from raw depth...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on industrial electronics (1982) 2022-05, Vol.69 (5), p.5342-5352
Hauptverfasser:	Wu, Keyu, Han, Wang, Abolfazli Esfahani, Mahdi, Yuan, Shenghai
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Autonomous navigation Autonomous robots Computer architecture Decoupling Deep learning deep neural network deep reinforcement learning (DRL) depth images Feature extraction Image acquisition Machine learning Navigation Neural networks Recurrent neural networks Robot kinematics Spatiotemporal phenomena Task analysis
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this article, we propose a deep reinforcement learning (DRL) algorithm as well as a novel tailored neural network architecture for mobile robots to learn navigation policies autonomously. We first introduce a new feature extractor to better acquire critical spatiotemporal features from raw depth images. In addition, we present a double-source scheme so that the experiences are collected from both the proposed model and a conventional planner alternatively based on a switching criterion to provide more diverse and comprehensive samples for learning. Moreover, we also propose a dual-soft-actor-critic architecture to train two sets of networks with different purposes simultaneously. Specifically, the primary network aims to learn the autonomous navigation policy, while the secondary network aims to learn the depth feature extractor. In this way, the learning performance can be improved through decoupling representation learning from policy learning and training the feature extractor separately with more specific goals. Experimental results have demonstrated the remarkable performance of the proposed model. It outperforms both the conventional and the state-of-the-art DRL-based methods consistently in terms of success rate. Meanwhile, it also exhibits higher trajectory quality and better generalization capability compared to existing DRL-based methods. Videos of our experiments are available at https://youtu.be/evjU6bOU3UY or OneDrive.
ISSN:	0278-0046 1557-9948
DOI:	10.1109/TIE.2021.3078353