BehAV: Behavioral Rule Guided Autonomy Using VLMs for Robot Navigation in Outdoor Scenes

We present BehAV, a novel approach for autonomous robot navigation in outdoor scenes guided by human instructions and leveraging Vision Language Models (VLMs). Our method interprets human commands using a Large Language Model (LLM) and categorizes the instructions into navigation and behavioral guid...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Weerakoon, Kasun, Elnoor, Mohamed, Seneviratne, Gershom, Rajagopal, Vignesh, Arul, Senthil Hariharan, Liang, Jing, Jaffar, Mohamed Khalid M, Manocha, Dinesh
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Robotics
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Weerakoon, Kasun Elnoor, Mohamed Seneviratne, Gershom Rajagopal, Vignesh Arul, Senthil Hariharan Liang, Jing Jaffar, Mohamed Khalid M Manocha, Dinesh
description	We present BehAV, a novel approach for autonomous robot navigation in outdoor scenes guided by human instructions and leveraging Vision Language Models (VLMs). Our method interprets human commands using a Large Language Model (LLM) and categorizes the instructions into navigation and behavioral guidelines. Navigation guidelines consist of directional commands (e.g., "move forward until") and associated landmarks (e.g., "the building with blue windows"), while behavioral guidelines encompass regulatory actions (e.g., "stay on") and their corresponding objects (e.g., "pavements"). We use VLMs for their zero-shot scene understanding capabilities to estimate landmark locations from RGB images for robot navigation. Further, we introduce a novel scene representation that utilizes VLMs to ground behavioral rules into a behavioral cost map. This cost map encodes the presence of behavioral objects within the scene and assigns costs based on their regulatory actions. The behavioral cost map is integrated with a LiDAR-based occupancy map for navigation. To navigate outdoor scenes while adhering to the instructed behaviors, we present an unconstrained Model Predictive Control (MPC)-based planner that prioritizes both reaching landmarks and following behavioral guidelines. We evaluate the performance of BehAV on a quadruped robot across diverse real-world scenarios, demonstrating a 22.49% improvement in alignment with human-teleoperated actions, as measured by Frechet distance, and achieving a 40% higher navigation success rate compared to state-of-the-art methods.
doi_str_mv	10.48550/arxiv.2409.16484
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2409_16484</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2409_16484</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2409_164843</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw1DM0M7Ew4WSIcErNcAyzUgBSiWWZ-UWJOQpBpTmpCu6lmSmpKQqOpSX5efm5lQqhxZl56QphPr7FCmn5RQpB-Un5JQp-QC3piSWZ-XkKmXkK_qUlKflAueDk1LzUYh4G1rTEnOJUXijNzSDv5hri7KELdkN8QVFmbmJRZTzILfFgtxgTVgEAhXw-ig</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>BehAV: Behavioral Rule Guided Autonomy Using VLMs for Robot Navigation in Outdoor Scenes</title><source>arXiv.org</source><creator>Weerakoon, Kasun ; Elnoor, Mohamed ; Seneviratne, Gershom ; Rajagopal, Vignesh ; Arul, Senthil Hariharan ; Liang, Jing ; Jaffar, Mohamed Khalid M ; Manocha, Dinesh</creator><creatorcontrib>Weerakoon, Kasun ; Elnoor, Mohamed ; Seneviratne, Gershom ; Rajagopal, Vignesh ; Arul, Senthil Hariharan ; Liang, Jing ; Jaffar, Mohamed Khalid M ; Manocha, Dinesh</creatorcontrib><description>We present BehAV, a novel approach for autonomous robot navigation in outdoor scenes guided by human instructions and leveraging Vision Language Models (VLMs). Our method interprets human commands using a Large Language Model (LLM) and categorizes the instructions into navigation and behavioral guidelines. Navigation guidelines consist of directional commands (e.g., "move forward until") and associated landmarks (e.g., "the building with blue windows"), while behavioral guidelines encompass regulatory actions (e.g., "stay on") and their corresponding objects (e.g., "pavements"). We use VLMs for their zero-shot scene understanding capabilities to estimate landmark locations from RGB images for robot navigation. Further, we introduce a novel scene representation that utilizes VLMs to ground behavioral rules into a behavioral cost map. This cost map encodes the presence of behavioral objects within the scene and assigns costs based on their regulatory actions. The behavioral cost map is integrated with a LiDAR-based occupancy map for navigation. To navigate outdoor scenes while adhering to the instructed behaviors, we present an unconstrained Model Predictive Control (MPC)-based planner that prioritizes both reaching landmarks and following behavioral guidelines. We evaluate the performance of BehAV on a quadruped robot across diverse real-world scenarios, demonstrating a 22.49% improvement in alignment with human-teleoperated actions, as measured by Frechet distance, and achieving a 40% higher navigation success rate compared to state-of-the-art methods.</description><identifier>DOI: 10.48550/arxiv.2409.16484</identifier><language>eng</language><subject>Computer Science - Robotics</subject><creationdate>2024-09</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,778,883</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2409.16484$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2409.16484$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Weerakoon, Kasun</creatorcontrib><creatorcontrib>Elnoor, Mohamed</creatorcontrib><creatorcontrib>Seneviratne, Gershom</creatorcontrib><creatorcontrib>Rajagopal, Vignesh</creatorcontrib><creatorcontrib>Arul, Senthil Hariharan</creatorcontrib><creatorcontrib>Liang, Jing</creatorcontrib><creatorcontrib>Jaffar, Mohamed Khalid M</creatorcontrib><creatorcontrib>Manocha, Dinesh</creatorcontrib><title>BehAV: Behavioral Rule Guided Autonomy Using VLMs for Robot Navigation in Outdoor Scenes</title><description>We present BehAV, a novel approach for autonomous robot navigation in outdoor scenes guided by human instructions and leveraging Vision Language Models (VLMs). Our method interprets human commands using a Large Language Model (LLM) and categorizes the instructions into navigation and behavioral guidelines. Navigation guidelines consist of directional commands (e.g., "move forward until") and associated landmarks (e.g., "the building with blue windows"), while behavioral guidelines encompass regulatory actions (e.g., "stay on") and their corresponding objects (e.g., "pavements"). We use VLMs for their zero-shot scene understanding capabilities to estimate landmark locations from RGB images for robot navigation. Further, we introduce a novel scene representation that utilizes VLMs to ground behavioral rules into a behavioral cost map. This cost map encodes the presence of behavioral objects within the scene and assigns costs based on their regulatory actions. The behavioral cost map is integrated with a LiDAR-based occupancy map for navigation. To navigate outdoor scenes while adhering to the instructed behaviors, we present an unconstrained Model Predictive Control (MPC)-based planner that prioritizes both reaching landmarks and following behavioral guidelines. We evaluate the performance of BehAV on a quadruped robot across diverse real-world scenarios, demonstrating a 22.49% improvement in alignment with human-teleoperated actions, as measured by Frechet distance, and achieving a 40% higher navigation success rate compared to state-of-the-art methods.</description><subject>Computer Science - Robotics</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw1DM0M7Ew4WSIcErNcAyzUgBSiWWZ-UWJOQpBpTmpCu6lmSmpKQqOpSX5efm5lQqhxZl56QphPr7FCmn5RQpB-Un5JQp-QC3piSWZ-XkKmXkK_qUlKflAueDk1LzUYh4G1rTEnOJUXijNzSDv5hri7KELdkN8QVFmbmJRZTzILfFgtxgTVgEAhXw-ig</recordid><startdate>20240924</startdate><enddate>20240924</enddate><creator>Weerakoon, Kasun</creator><creator>Elnoor, Mohamed</creator><creator>Seneviratne, Gershom</creator><creator>Rajagopal, Vignesh</creator><creator>Arul, Senthil Hariharan</creator><creator>Liang, Jing</creator><creator>Jaffar, Mohamed Khalid M</creator><creator>Manocha, Dinesh</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240924</creationdate><title>BehAV: Behavioral Rule Guided Autonomy Using VLMs for Robot Navigation in Outdoor Scenes</title><author>Weerakoon, Kasun ; Elnoor, Mohamed ; Seneviratne, Gershom ; Rajagopal, Vignesh ; Arul, Senthil Hariharan ; Liang, Jing ; Jaffar, Mohamed Khalid M ; Manocha, Dinesh</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2409_164843</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Robotics</topic><toplevel>online_resources</toplevel><creatorcontrib>Weerakoon, Kasun</creatorcontrib><creatorcontrib>Elnoor, Mohamed</creatorcontrib><creatorcontrib>Seneviratne, Gershom</creatorcontrib><creatorcontrib>Rajagopal, Vignesh</creatorcontrib><creatorcontrib>Arul, Senthil Hariharan</creatorcontrib><creatorcontrib>Liang, Jing</creatorcontrib><creatorcontrib>Jaffar, Mohamed Khalid M</creatorcontrib><creatorcontrib>Manocha, Dinesh</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Weerakoon, Kasun</au><au>Elnoor, Mohamed</au><au>Seneviratne, Gershom</au><au>Rajagopal, Vignesh</au><au>Arul, Senthil Hariharan</au><au>Liang, Jing</au><au>Jaffar, Mohamed Khalid M</au><au>Manocha, Dinesh</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>BehAV: Behavioral Rule Guided Autonomy Using VLMs for Robot Navigation in Outdoor Scenes</atitle><date>2024-09-24</date><risdate>2024</risdate><abstract>We present BehAV, a novel approach for autonomous robot navigation in outdoor scenes guided by human instructions and leveraging Vision Language Models (VLMs). Our method interprets human commands using a Large Language Model (LLM) and categorizes the instructions into navigation and behavioral guidelines. Navigation guidelines consist of directional commands (e.g., "move forward until") and associated landmarks (e.g., "the building with blue windows"), while behavioral guidelines encompass regulatory actions (e.g., "stay on") and their corresponding objects (e.g., "pavements"). We use VLMs for their zero-shot scene understanding capabilities to estimate landmark locations from RGB images for robot navigation. Further, we introduce a novel scene representation that utilizes VLMs to ground behavioral rules into a behavioral cost map. This cost map encodes the presence of behavioral objects within the scene and assigns costs based on their regulatory actions. The behavioral cost map is integrated with a LiDAR-based occupancy map for navigation. To navigate outdoor scenes while adhering to the instructed behaviors, we present an unconstrained Model Predictive Control (MPC)-based planner that prioritizes both reaching landmarks and following behavioral guidelines. We evaluate the performance of BehAV on a quadruped robot across diverse real-world scenarios, demonstrating a 22.49% improvement in alignment with human-teleoperated actions, as measured by Frechet distance, and achieving a 40% higher navigation success rate compared to state-of-the-art methods.</abstract><doi>10.48550/arxiv.2409.16484</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2409.16484
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2409_16484
source	arXiv.org
subjects	Computer Science - Robotics
title	BehAV: Behavioral Rule Guided Autonomy Using VLMs for Robot Navigation in Outdoor Scenes
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T01%3A57%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=BehAV:%20Behavioral%20Rule%20Guided%20Autonomy%20Using%20VLMs%20for%20Robot%20Navigation%20in%20Outdoor%20Scenes&rft.au=Weerakoon,%20Kasun&rft.date=2024-09-24&rft_id=info:doi/10.48550/arxiv.2409.16484&rft_dat=%3Carxiv_GOX%3E2409_16484%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true