GameVLM: A Decision-making Framework for Robotic Task Planning Based on Visual Language Models and Zero-sum Games

With their prominent scene understanding and reasoning capabilities, pre-trained visual-language models (VLMs) such as GPT-4V have attracted increasing attention in robotic task planning. Compared with traditional task planning strategies, VLMs are strong in multimodal information parsing and code g...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Mei, Aoran, Wang, Jianhua, Zhu, Guo-Niu, Gan, Zhongxue
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Mei, Aoran
Wang, Jianhua
Zhu, Guo-Niu
Gan, Zhongxue
description With their prominent scene understanding and reasoning capabilities, pre-trained visual-language models (VLMs) such as GPT-4V have attracted increasing attention in robotic task planning. Compared with traditional task planning strategies, VLMs are strong in multimodal information parsing and code generation and show remarkable efficiency. Although VLMs demonstrate great potential in robotic task planning, they suffer from challenges like hallucination, semantic complexity, and limited context. To handle such issues, this paper proposes a multi-agent framework, i.e., GameVLM, to enhance the decision-making process in robotic task planning. In this study, VLM-based decision and expert agents are presented to conduct the task planning. Specifically, decision agents are used to plan the task, and the expert agent is employed to evaluate these task plans. Zero-sum game theory is introduced to resolve inconsistencies among different agents and determine the optimal solution. Experimental results on real robots demonstrate the efficacy of the proposed framework, with an average success rate of 83.3%.
doi_str_mv 10.48550/arxiv.2405.13751
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2405_13751</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2405_13751</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-16840c66b18daf53c871cae46f31d507c02bfa76ed27c032cfee1fdce344b6ed3</originalsourceid><addsrcrecordid>eNotj81OhDAUhdm4MKMP4Mr7AmBLoRB34-iMJkw0hszCDbn0hzRAq6348_YDo6tzcs7Jvfmi6IqSJCvznNyg_zFfSZqRPKGsyOl59LHDUR2q_S2s4V4JE4yz8Yi9sR1s_dx9O9-Ddh5eXes-jYAaQw8vA1q7bO4wKAnOwsGECQeo0HYTdgr2TqohAFoJb8q7OEwjLL_CRXSmcQjq8l9XUb19qDePcfW8e9qsqxh5QWPKy4wIzltaStQ5E2VBBaqMa0ZlTgpB0lZjwZVMZ89SoZWiWgrFsqydU7aKrv_Onpibd29G9L_Nwt6c2NkRF95U8w</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>GameVLM: A Decision-making Framework for Robotic Task Planning Based on Visual Language Models and Zero-sum Games</title><source>arXiv.org</source><creator>Mei, Aoran ; Wang, Jianhua ; Zhu, Guo-Niu ; Gan, Zhongxue</creator><creatorcontrib>Mei, Aoran ; Wang, Jianhua ; Zhu, Guo-Niu ; Gan, Zhongxue</creatorcontrib><description>With their prominent scene understanding and reasoning capabilities, pre-trained visual-language models (VLMs) such as GPT-4V have attracted increasing attention in robotic task planning. Compared with traditional task planning strategies, VLMs are strong in multimodal information parsing and code generation and show remarkable efficiency. Although VLMs demonstrate great potential in robotic task planning, they suffer from challenges like hallucination, semantic complexity, and limited context. To handle such issues, this paper proposes a multi-agent framework, i.e., GameVLM, to enhance the decision-making process in robotic task planning. In this study, VLM-based decision and expert agents are presented to conduct the task planning. Specifically, decision agents are used to plan the task, and the expert agent is employed to evaluate these task plans. Zero-sum game theory is introduced to resolve inconsistencies among different agents and determine the optimal solution. Experimental results on real robots demonstrate the efficacy of the proposed framework, with an average success rate of 83.3%.</description><identifier>DOI: 10.48550/arxiv.2405.13751</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Robotics</subject><creationdate>2024-05</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2405.13751$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2405.13751$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Mei, Aoran</creatorcontrib><creatorcontrib>Wang, Jianhua</creatorcontrib><creatorcontrib>Zhu, Guo-Niu</creatorcontrib><creatorcontrib>Gan, Zhongxue</creatorcontrib><title>GameVLM: A Decision-making Framework for Robotic Task Planning Based on Visual Language Models and Zero-sum Games</title><description>With their prominent scene understanding and reasoning capabilities, pre-trained visual-language models (VLMs) such as GPT-4V have attracted increasing attention in robotic task planning. Compared with traditional task planning strategies, VLMs are strong in multimodal information parsing and code generation and show remarkable efficiency. Although VLMs demonstrate great potential in robotic task planning, they suffer from challenges like hallucination, semantic complexity, and limited context. To handle such issues, this paper proposes a multi-agent framework, i.e., GameVLM, to enhance the decision-making process in robotic task planning. In this study, VLM-based decision and expert agents are presented to conduct the task planning. Specifically, decision agents are used to plan the task, and the expert agent is employed to evaluate these task plans. Zero-sum game theory is introduced to resolve inconsistencies among different agents and determine the optimal solution. Experimental results on real robots demonstrate the efficacy of the proposed framework, with an average success rate of 83.3%.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Robotics</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj81OhDAUhdm4MKMP4Mr7AmBLoRB34-iMJkw0hszCDbn0hzRAq6348_YDo6tzcs7Jvfmi6IqSJCvznNyg_zFfSZqRPKGsyOl59LHDUR2q_S2s4V4JE4yz8Yi9sR1s_dx9O9-Ddh5eXes-jYAaQw8vA1q7bO4wKAnOwsGECQeo0HYTdgr2TqohAFoJb8q7OEwjLL_CRXSmcQjq8l9XUb19qDePcfW8e9qsqxh5QWPKy4wIzltaStQ5E2VBBaqMa0ZlTgpB0lZjwZVMZ89SoZWiWgrFsqydU7aKrv_Onpibd29G9L_Nwt6c2NkRF95U8w</recordid><startdate>20240522</startdate><enddate>20240522</enddate><creator>Mei, Aoran</creator><creator>Wang, Jianhua</creator><creator>Zhu, Guo-Niu</creator><creator>Gan, Zhongxue</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240522</creationdate><title>GameVLM: A Decision-making Framework for Robotic Task Planning Based on Visual Language Models and Zero-sum Games</title><author>Mei, Aoran ; Wang, Jianhua ; Zhu, Guo-Niu ; Gan, Zhongxue</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-16840c66b18daf53c871cae46f31d507c02bfa76ed27c032cfee1fdce344b6ed3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Robotics</topic><toplevel>online_resources</toplevel><creatorcontrib>Mei, Aoran</creatorcontrib><creatorcontrib>Wang, Jianhua</creatorcontrib><creatorcontrib>Zhu, Guo-Niu</creatorcontrib><creatorcontrib>Gan, Zhongxue</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Mei, Aoran</au><au>Wang, Jianhua</au><au>Zhu, Guo-Niu</au><au>Gan, Zhongxue</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>GameVLM: A Decision-making Framework for Robotic Task Planning Based on Visual Language Models and Zero-sum Games</atitle><date>2024-05-22</date><risdate>2024</risdate><abstract>With their prominent scene understanding and reasoning capabilities, pre-trained visual-language models (VLMs) such as GPT-4V have attracted increasing attention in robotic task planning. Compared with traditional task planning strategies, VLMs are strong in multimodal information parsing and code generation and show remarkable efficiency. Although VLMs demonstrate great potential in robotic task planning, they suffer from challenges like hallucination, semantic complexity, and limited context. To handle such issues, this paper proposes a multi-agent framework, i.e., GameVLM, to enhance the decision-making process in robotic task planning. In this study, VLM-based decision and expert agents are presented to conduct the task planning. Specifically, decision agents are used to plan the task, and the expert agent is employed to evaluate these task plans. Zero-sum game theory is introduced to resolve inconsistencies among different agents and determine the optimal solution. Experimental results on real robots demonstrate the efficacy of the proposed framework, with an average success rate of 83.3%.</abstract><doi>10.48550/arxiv.2405.13751</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2405.13751
ispartof
issn
language eng
recordid cdi_arxiv_primary_2405_13751
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Robotics
title GameVLM: A Decision-making Framework for Robotic Task Planning Based on Visual Language Models and Zero-sum Games
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T08%3A41%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=GameVLM:%20A%20Decision-making%20Framework%20for%20Robotic%20Task%20Planning%20Based%20on%20Visual%20Language%20Models%20and%20Zero-sum%20Games&rft.au=Mei,%20Aoran&rft.date=2024-05-22&rft_id=info:doi/10.48550/arxiv.2405.13751&rft_dat=%3Carxiv_GOX%3E2405_13751%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true