DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments

Zero-Shot Object Navigation (ZSON) requires agents to autonomously locate and approach unseen objects in unfamiliar environments and has emerged as a particularly challenging task within the domain of Embodied AI. Existing datasets for developing ZSON algorithms lack consideration of dynamic obstacl...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE robotics and automation letters 2024-09, Vol.9 (9), p.7389-7396
Hauptverfasser:	Ma, Ji, Dai, Hongming, Mu, Yao, Wu, Pengying, Wang, Hao, Chi, Xiaowei, Fei, Yang, Zhang, Shanghang, Liu, Chang
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial intelligence Collision avoidance Data sets for robot learning data sets for robotic vision Datasets embodied AI Humanoid Humanoid robots Image color analysis Moving obstacles Navigation Object recognition Obstacle avoidance semantic scene understanding Task analysis Task complexity Three-dimensional displays Training zero-shot object navigation
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Zero-Shot Object Navigation (ZSON) requires agents to autonomously locate and approach unseen objects in unfamiliar environments and has emerged as a particularly challenging task within the domain of Embodied AI. Existing datasets for developing ZSON algorithms lack consideration of dynamic obstacles, object attribute diversity, and scene texts, thus exhibiting noticeable discrepancies from real-world situations. To address these issues, we propose a Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments (DOZE) that comprises ten high-fidelity 3D scenes with over 18 k tasks, aiming to mimic complex, dynamic real-world scenarios. Specifically, DOZE scenes feature multiple moving humanoid obstacles, a wide array of open-vocabulary objects, diverse distinct-attribute objects, and valuable textual hints. Besides, different from existing datasets that only provide collision checking between the agent and static obstacles, we enhance DOZE by integrating capabilities for detecting collisions between the agent and moving obstacles. This novel functionality enables the evaluation of the agents' collision avoidance abilities in dynamic environments. We test four representative ZSON methods on DOZE, revealing substantial room for improvement in existing approaches concerning navigation efficiency, safety, and object recognition accuracy.
ISSN:	2377-3766 2377-3766
DOI:	10.1109/LRA.2024.3426381