Can an Embodied Agent Find Your "Cat-shaped Mug"? LLM-Based Zero-Shot Object Navigation

We present language-guided exploration (LGX), a novel algorithm for Language-Driven Zero-Shot Object Goal Navigation (L-ZSON), where an embodied agent navigates to an uniquely described target object in a previously unseen environment. Our approach makes use of large language models (LLMs) for this...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE robotics and automation letters 2024-05, Vol.9 (5), p.4083-4090
Hauptverfasser:	Dorbala, Vishnu Sashank, Mullen, James F., Manocha, Dinesh
Format:	Artikel
Sprache:	eng
Schlagworte:	AI-enabled robotics Algorithms autonomous agents Decision making domestic robotics Grounding human-centered robotics Large language models Natural language processing Natural languages Navigation Object recognition Planning Service robots Target detection
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We present language-guided exploration (LGX), a novel algorithm for Language-Driven Zero-Shot Object Goal Navigation (L-ZSON), where an embodied agent navigates to an uniquely described target object in a previously unseen environment. Our approach makes use of large language models (LLMs) for this task by leveraging the LLM's commonsense-reasoning capabilities for making sequential navigational decisions. Simultaneously, we perform generalized target object detection using a pre-trained Vision-Language grounding model. We achieve state-of-the-art zero-shot object navigation results on RoboTHOR with a success rate (SR) improvement of over 27% over the current baseline of the OWL-ViT CLIP on Wheels (OWL CoW). Furthermore, we study the usage of LLMs for robot navigation and present an analysis of various prompting strategies affecting the model output. Finally, we showcase the benefits of our approach via real-world experiments that indicate the superior performance of LGX in detecting and navigating to visually unique objects.
ISSN:	2377-3766 2377-3766
DOI:	10.1109/LRA.2023.3346800