Think Holistically, Act Down-to-Earth: A Semantic Navigation Strategy With Continuous Environmental Representation and Multi-Step Forward Planning
The Object goal Navigation (ObjectNav) task requires an agent to navigate through a previously unknown domestic scenario using spatial and semantic contextual information, where the goal is specified by a semantic label (e.g., find a TV). Such a task is especially challenging as it requires formulat...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on circuits and systems for video technology 2024-05, Vol.34 (5), p.3860-3875 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The Object goal Navigation (ObjectNav) task requires an agent to navigate through a previously unknown domestic scenario using spatial and semantic contextual information, where the goal is specified by a semantic label (e.g., find a TV). Such a task is especially challenging as it requires formulating and understanding the complex co-occurrence relations among objects in diverse settings, which is critical for long-sequence navigational decision-making. Existing methods learn to either explicitly represent co-occurrence relationships as discrete semantic priors, or implicitly encode them from raw observations, thus can not benefit from the rich environmental semantics. In this work, we propose a novel Deep Reinforcement Learning (DRL) based ObjectNav strategy by actively imagining spatial and semantic clues outside the agent's Field of View (FoV) and further mining Continuous Environmental Representations (CER) using self-supervised learning. Additionally, the illusion of spatial and semantic patterns allows the agent to perform Multi-Step Forward-Looking Planning (MSFLP) by considering the temporal evolution of egocentric local observations. Our approach is thoroughly evaluated and ablated in the visually realistic environments of the Matterport3D (MP3D) dataset. The experimental results reflect that our method combining CER and imagination-based MSFLP facilitates learning complicated semantic priors and navigation skills, thus achieving state-of-the-art performance on the ObjectNav task. In addition, adequate quantitative and qualitative analyses validate the excellent generalization ability and superiority of our method. |
---|---|
ISSN: | 1051-8215 1558-2205 |
DOI: | 10.1109/TCSVT.2023.3324380 |