Generalization in Deep Reinforcement Learning for Robotic Navigation by Reward Shaping

This paper addresses the application of Deep Reinforcement Learning (DRL) methods in the context of local navigation, i.e., a robot moves towards a goal location in unknown and cluttered workspaces equipped only with limited-range exteroceptive sensors. Collision avoidance policies based on DRL pres...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on industrial electronics (1982) 2024-06, Vol.71 (6), p.1-8
Hauptverfasser: Miranda, Victor R. F., Neto, Armando A., Freitas, Gustavo M., Mozelli, Leonardo A.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper addresses the application of Deep Reinforcement Learning (DRL) methods in the context of local navigation, i.e., a robot moves towards a goal location in unknown and cluttered workspaces equipped only with limited-range exteroceptive sensors. Collision avoidance policies based on DRL present advantages, but they are quite susceptible to local minima, once their capacity to learn suitable actions is limited to the sensor range. We address this issue by means of reward shaping in actorcritic networks. A dense reward function, that incorporates map information gained in the training stage, is proposed to increase the agent's capacity to decide about the best action. Also, we offer a comparison between the Twin Delayed Deep-Deterministic Policy Gradient (TD3) andSoft Actor-Critic (SAC) algorithms for training our policy. A set of sim-to-sim and sim-to-real trials illustrate that our proposed reward shaping outperforms the compared methods in terms of generalization, by arriving at the target at higher rates in maps that are prone to local minima and collisions.
ISSN:0278-0046
1557-9948
DOI:10.1109/TIE.2023.3290244