Rover-IRL: Inverse Reinforcement Learning With Soft Value Iteration Networks for Planetary Rover Path Planning

Planetary rovers, such as those currently on Mars, face difficult path planning problems, both before landing during the mission planning stages as well as once on the ground. In this work, we present a new approach to these planning problems based on inverse reinforcement learning using deep convol...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE robotics and automation letters 2019-04, Vol.4 (2), p.1387-1394
Hauptverfasser:	Pflueger, Max, Agha, Ali, Sukhatme, Gaurav S.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial neural networks Datasets deep learning in robotics and automation Imagery Iterative methods learning from demonstration Machine learning Mars Mission planning Navigation Orbits Path planning Planetary rovers Planning Reinforcement learning Space robotics and automation Space vehicles
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Planetary rovers, such as those currently on Mars, face difficult path planning problems, both before landing during the mission planning stages as well as once on the ground. In this work, we present a new approach to these planning problems based on inverse reinforcement learning using deep convolutional networks and value iteration networks (VIN) as important internal structures. VIN are an approximation of the value iteration (VI) algorithm implemented with convolutional neural networks to make VI fully differentiable. We propose a modification to the value iteration recurrence, referred to as the soft value iteration network (SVIN). SVIN is designed to produce more effective training gradients through the VIN. It relies on an internal soft policy model, where the policy is represented with a probability distribution over all possible actions, rather than a deterministic policy that returns only the best action. We demonstrate the effectiveness of our proposed architecture in both a grid world dataset as well as a highly realistic synthetic dataset generated from currently deployed rover mission planning tools and real Mars imagery.
ISSN:	2377-3766 2377-3766
DOI:	10.1109/LRA.2019.2895892