Robot Navigation Using Physically Grounded Vision-Language Models in Outdoor Environments
We present a novel autonomous robot navigation algorithm for outdoor environments that is capable of handling diverse terrain traversability conditions. Our approach, VLM-GroNav, uses vision-language models (VLMs) and integrates them with physical grounding that is used to assess intrinsic terrain p...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We present a novel autonomous robot navigation algorithm for outdoor
environments that is capable of handling diverse terrain traversability
conditions. Our approach, VLM-GroNav, uses vision-language models (VLMs) and
integrates them with physical grounding that is used to assess intrinsic
terrain properties such as deformability and slipperiness. We use
proprioceptive-based sensing, which provides direct measurements of these
physical properties, and enhances the overall semantic understanding of the
terrains. Our formulation uses in-context learning to ground the VLM's semantic
understanding with proprioceptive data to allow dynamic updates of
traversability estimates based on the robot's real-time physical interactions
with the environment. We use the updated traversability estimations to inform
both the local and global planners for real-time trajectory replanning. We
validate our method on a legged robot (Ghost Vision 60) and a wheeled robot
(Clearpath Husky), in diverse real-world outdoor environments with different
deformable and slippery terrains. In practice, we observe significant
improvements over state-of-the-art methods by up to 50% increase in navigation
success rate. |
---|---|
DOI: | 10.48550/arxiv.2409.20445 |