BehAV: Behavioral Rule Guided Autonomy Using VLMs for Robot Navigation in Outdoor Scenes

We present BehAV, a novel approach for autonomous robot navigation in outdoor scenes guided by human instructions and leveraging Vision Language Models (VLMs). Our method interprets human commands using a Large Language Model (LLM) and categorizes the instructions into navigation and behavioral guid...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Weerakoon, Kasun, Elnoor, Mohamed, Seneviratne, Gershom, Rajagopal, Vignesh, Arul, Senthil Hariharan, Liang, Jing, Jaffar, Mohamed Khalid M, Manocha, Dinesh
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Weerakoon, Kasun
Elnoor, Mohamed
Seneviratne, Gershom
Rajagopal, Vignesh
Arul, Senthil Hariharan
Liang, Jing
Jaffar, Mohamed Khalid M
Manocha, Dinesh
description We present BehAV, a novel approach for autonomous robot navigation in outdoor scenes guided by human instructions and leveraging Vision Language Models (VLMs). Our method interprets human commands using a Large Language Model (LLM) and categorizes the instructions into navigation and behavioral guidelines. Navigation guidelines consist of directional commands (e.g., "move forward until") and associated landmarks (e.g., "the building with blue windows"), while behavioral guidelines encompass regulatory actions (e.g., "stay on") and their corresponding objects (e.g., "pavements"). We use VLMs for their zero-shot scene understanding capabilities to estimate landmark locations from RGB images for robot navigation. Further, we introduce a novel scene representation that utilizes VLMs to ground behavioral rules into a behavioral cost map. This cost map encodes the presence of behavioral objects within the scene and assigns costs based on their regulatory actions. The behavioral cost map is integrated with a LiDAR-based occupancy map for navigation. To navigate outdoor scenes while adhering to the instructed behaviors, we present an unconstrained Model Predictive Control (MPC)-based planner that prioritizes both reaching landmarks and following behavioral guidelines. We evaluate the performance of BehAV on a quadruped robot across diverse real-world scenarios, demonstrating a 22.49% improvement in alignment with human-teleoperated actions, as measured by Frechet distance, and achieving a 40% higher navigation success rate compared to state-of-the-art methods.
doi_str_mv 10.48550/arxiv.2409.16484
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2409_16484</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2409_16484</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2409_164843</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw1DM0M7Ew4WSIcErNcAyzUgBSiWWZ-UWJOQpBpTmpCu6lmSmpKQqOpSX5efm5lQqhxZl56QphPr7FCmn5RQpB-Un5JQp-QC3piSWZ-XkKmXkK_qUlKflAueDk1LzUYh4G1rTEnOJUXijNzSDv5hri7KELdkN8QVFmbmJRZTzILfFgtxgTVgEAhXw-ig</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>BehAV: Behavioral Rule Guided Autonomy Using VLMs for Robot Navigation in Outdoor Scenes</title><source>arXiv.org</source><creator>Weerakoon, Kasun ; Elnoor, Mohamed ; Seneviratne, Gershom ; Rajagopal, Vignesh ; Arul, Senthil Hariharan ; Liang, Jing ; Jaffar, Mohamed Khalid M ; Manocha, Dinesh</creator><creatorcontrib>Weerakoon, Kasun ; Elnoor, Mohamed ; Seneviratne, Gershom ; Rajagopal, Vignesh ; Arul, Senthil Hariharan ; Liang, Jing ; Jaffar, Mohamed Khalid M ; Manocha, Dinesh</creatorcontrib><description>We present BehAV, a novel approach for autonomous robot navigation in outdoor scenes guided by human instructions and leveraging Vision Language Models (VLMs). Our method interprets human commands using a Large Language Model (LLM) and categorizes the instructions into navigation and behavioral guidelines. Navigation guidelines consist of directional commands (e.g., "move forward until") and associated landmarks (e.g., "the building with blue windows"), while behavioral guidelines encompass regulatory actions (e.g., "stay on") and their corresponding objects (e.g., "pavements"). We use VLMs for their zero-shot scene understanding capabilities to estimate landmark locations from RGB images for robot navigation. Further, we introduce a novel scene representation that utilizes VLMs to ground behavioral rules into a behavioral cost map. This cost map encodes the presence of behavioral objects within the scene and assigns costs based on their regulatory actions. The behavioral cost map is integrated with a LiDAR-based occupancy map for navigation. To navigate outdoor scenes while adhering to the instructed behaviors, we present an unconstrained Model Predictive Control (MPC)-based planner that prioritizes both reaching landmarks and following behavioral guidelines. We evaluate the performance of BehAV on a quadruped robot across diverse real-world scenarios, demonstrating a 22.49% improvement in alignment with human-teleoperated actions, as measured by Frechet distance, and achieving a 40% higher navigation success rate compared to state-of-the-art methods.</description><identifier>DOI: 10.48550/arxiv.2409.16484</identifier><language>eng</language><subject>Computer Science - Robotics</subject><creationdate>2024-09</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,778,883</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2409.16484$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2409.16484$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Weerakoon, Kasun</creatorcontrib><creatorcontrib>Elnoor, Mohamed</creatorcontrib><creatorcontrib>Seneviratne, Gershom</creatorcontrib><creatorcontrib>Rajagopal, Vignesh</creatorcontrib><creatorcontrib>Arul, Senthil Hariharan</creatorcontrib><creatorcontrib>Liang, Jing</creatorcontrib><creatorcontrib>Jaffar, Mohamed Khalid M</creatorcontrib><creatorcontrib>Manocha, Dinesh</creatorcontrib><title>BehAV: Behavioral Rule Guided Autonomy Using VLMs for Robot Navigation in Outdoor Scenes</title><description>We present BehAV, a novel approach for autonomous robot navigation in outdoor scenes guided by human instructions and leveraging Vision Language Models (VLMs). Our method interprets human commands using a Large Language Model (LLM) and categorizes the instructions into navigation and behavioral guidelines. Navigation guidelines consist of directional commands (e.g., "move forward until") and associated landmarks (e.g., "the building with blue windows"), while behavioral guidelines encompass regulatory actions (e.g., "stay on") and their corresponding objects (e.g., "pavements"). We use VLMs for their zero-shot scene understanding capabilities to estimate landmark locations from RGB images for robot navigation. Further, we introduce a novel scene representation that utilizes VLMs to ground behavioral rules into a behavioral cost map. This cost map encodes the presence of behavioral objects within the scene and assigns costs based on their regulatory actions. The behavioral cost map is integrated with a LiDAR-based occupancy map for navigation. To navigate outdoor scenes while adhering to the instructed behaviors, we present an unconstrained Model Predictive Control (MPC)-based planner that prioritizes both reaching landmarks and following behavioral guidelines. We evaluate the performance of BehAV on a quadruped robot across diverse real-world scenarios, demonstrating a 22.49% improvement in alignment with human-teleoperated actions, as measured by Frechet distance, and achieving a 40% higher navigation success rate compared to state-of-the-art methods.</description><subject>Computer Science - Robotics</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw1DM0M7Ew4WSIcErNcAyzUgBSiWWZ-UWJOQpBpTmpCu6lmSmpKQqOpSX5efm5lQqhxZl56QphPr7FCmn5RQpB-Un5JQp-QC3piSWZ-XkKmXkK_qUlKflAueDk1LzUYh4G1rTEnOJUXijNzSDv5hri7KELdkN8QVFmbmJRZTzILfFgtxgTVgEAhXw-ig</recordid><startdate>20240924</startdate><enddate>20240924</enddate><creator>Weerakoon, Kasun</creator><creator>Elnoor, Mohamed</creator><creator>Seneviratne, Gershom</creator><creator>Rajagopal, Vignesh</creator><creator>Arul, Senthil Hariharan</creator><creator>Liang, Jing</creator><creator>Jaffar, Mohamed Khalid M</creator><creator>Manocha, Dinesh</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240924</creationdate><title>BehAV: Behavioral Rule Guided Autonomy Using VLMs for Robot Navigation in Outdoor Scenes</title><author>Weerakoon, Kasun ; Elnoor, Mohamed ; Seneviratne, Gershom ; Rajagopal, Vignesh ; Arul, Senthil Hariharan ; Liang, Jing ; Jaffar, Mohamed Khalid M ; Manocha, Dinesh</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2409_164843</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Robotics</topic><toplevel>online_resources</toplevel><creatorcontrib>Weerakoon, Kasun</creatorcontrib><creatorcontrib>Elnoor, Mohamed</creatorcontrib><creatorcontrib>Seneviratne, Gershom</creatorcontrib><creatorcontrib>Rajagopal, Vignesh</creatorcontrib><creatorcontrib>Arul, Senthil Hariharan</creatorcontrib><creatorcontrib>Liang, Jing</creatorcontrib><creatorcontrib>Jaffar, Mohamed Khalid M</creatorcontrib><creatorcontrib>Manocha, Dinesh</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Weerakoon, Kasun</au><au>Elnoor, Mohamed</au><au>Seneviratne, Gershom</au><au>Rajagopal, Vignesh</au><au>Arul, Senthil Hariharan</au><au>Liang, Jing</au><au>Jaffar, Mohamed Khalid M</au><au>Manocha, Dinesh</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>BehAV: Behavioral Rule Guided Autonomy Using VLMs for Robot Navigation in Outdoor Scenes</atitle><date>2024-09-24</date><risdate>2024</risdate><abstract>We present BehAV, a novel approach for autonomous robot navigation in outdoor scenes guided by human instructions and leveraging Vision Language Models (VLMs). Our method interprets human commands using a Large Language Model (LLM) and categorizes the instructions into navigation and behavioral guidelines. Navigation guidelines consist of directional commands (e.g., "move forward until") and associated landmarks (e.g., "the building with blue windows"), while behavioral guidelines encompass regulatory actions (e.g., "stay on") and their corresponding objects (e.g., "pavements"). We use VLMs for their zero-shot scene understanding capabilities to estimate landmark locations from RGB images for robot navigation. Further, we introduce a novel scene representation that utilizes VLMs to ground behavioral rules into a behavioral cost map. This cost map encodes the presence of behavioral objects within the scene and assigns costs based on their regulatory actions. The behavioral cost map is integrated with a LiDAR-based occupancy map for navigation. To navigate outdoor scenes while adhering to the instructed behaviors, we present an unconstrained Model Predictive Control (MPC)-based planner that prioritizes both reaching landmarks and following behavioral guidelines. We evaluate the performance of BehAV on a quadruped robot across diverse real-world scenarios, demonstrating a 22.49% improvement in alignment with human-teleoperated actions, as measured by Frechet distance, and achieving a 40% higher navigation success rate compared to state-of-the-art methods.</abstract><doi>10.48550/arxiv.2409.16484</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2409.16484
ispartof
issn
language eng
recordid cdi_arxiv_primary_2409_16484
source arXiv.org
subjects Computer Science - Robotics
title BehAV: Behavioral Rule Guided Autonomy Using VLMs for Robot Navigation in Outdoor Scenes
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T01%3A57%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=BehAV:%20Behavioral%20Rule%20Guided%20Autonomy%20Using%20VLMs%20for%20Robot%20Navigation%20in%20Outdoor%20Scenes&rft.au=Weerakoon,%20Kasun&rft.date=2024-09-24&rft_id=info:doi/10.48550/arxiv.2409.16484&rft_dat=%3Carxiv_GOX%3E2409_16484%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true