ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application

This paper introduces a novel method for translating natural-language instructions into executable robot actions using OpenAI's ChatGPT in a few-shot setting. We propose customizable input prompts for ChatGPT that can easily integrate with robot execution systems or visual recognition programs,...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2023, Vol.11, p.95060-95078
Hauptverfasser:	Wake, Naoki, Kanehira, Atsushi, Sasabuchi, Kazuhiro, Takamatsu, Jun, Ikeuchi, Katsushi
Format:	Artikel
Sprache:	eng
Schlagworte:	Chatbots ChatGPT Feedback large language models Natural language processing Oral communication Planning Robot control robot manipulation Robotics Robots Source code Task analysis Task planning Task planning (robotics) Translating Visualization
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	95078
container_issue
container_start_page	95060
container_title	IEEE access
container_volume	11
creator	Wake, Naoki Kanehira, Atsushi Sasabuchi, Kazuhiro Takamatsu, Jun Ikeuchi, Katsushi
description	This paper introduces a novel method for translating natural-language instructions into executable robot actions using OpenAI's ChatGPT in a few-shot setting. We propose customizable input prompts for ChatGPT that can easily integrate with robot execution systems or visual recognition programs, adapt to various environments, and create multi-step task plans while mitigating the impact of token limit imposed on ChatGPT. In our approach, ChatGPT receives both instructions and textual environmental data, and outputs a task plan and an updated environment. These environmental data are reused in subsequent task planning, thus eliminating the extensive record-keeping of prior task plans within the prompts of ChatGPT. Experimental results demonstrated the effectiveness of these prompts across various domestic environments, such as manipulations in front of a shelf, a fridge, and a drawer. The conversational capability of ChatGPT allows users to adjust the output via natural-language feedback. Additionally, a quantitative evaluation using VirtualHome showed that our results are comparable to previous studies. Specifically, 36% of task planning met both executability and correctness, and the rate approached 100% after several rounds of feedback. Our experiments revealed that ChatGPT can reasonably plan tasks and estimate post-operation environments without actual experience in object manipulation. Despite the allure of ChatGPT-based task planning in robotics, a standardized methodology remains elusive, making our work a substantial contribution. These prompts can serve as customizable templates, offering practical resources for the robotics research community. Our prompts and source code are open source and publicly available at https://github.com/microsoft/ChatGPT-Robot-Manipulation-Prompts .
doi_str_mv	10.1109/ACCESS.2023.3310935
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_ACCESS_2023_3310935</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10235949</ieee_id><doaj_id>oai_doaj_org_article_d310831d5296475aa84f1dabd9cfc6d2</doaj_id><sourcerecordid>2862622312</sourcerecordid><originalsourceid>FETCH-LOGICAL-c409t-6beb3d7cb1b4361c6a26c7de3c852617663131808fd1b836ccc652d768ebb00e3</originalsourceid><addsrcrecordid>eNpNUU1LxDAQLaKgqL9ADwHPXZNMm6belrKuCwuKq15DvqpZdpuaZBX_vV0r4lxmeLz3ZpiXZRcETwjB9fW0aWar1YRiChOAAYHyIDuhhNU5lMAO_83H2XmMazwUH6CyOsmemzeZ5g9PaLbt_acN1qCl717zVbI9evTKJ9T4LgW_Qa5DLzI4v4to1n244Lut7VK8QVPUyGjRtO83TsvkfHeWHbVyE-35bz_Nnm9nT81dvryfL5rpMtcFrlPOlFVgKq2IKoARzSRlujIWNC8pIxVjQIBwzFtDFAemtWYlNRXjVimMLZxmi9HXeLkWfXBbGb6El078AD68ChmS0xsrzPAZDsSUtGZFVUrJi5YYqUytW80MHbyuRq8--PedjUms_S50w_mCckYZpUD2LBhZOvgYg23_thIs9nGIMQ6xj0P8xjGoLkeVs9b-U1Ao66KGb6SuhQg</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2862622312</pqid></control><display><type>article</type><title>ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Wake, Naoki ; Kanehira, Atsushi ; Sasabuchi, Kazuhiro ; Takamatsu, Jun ; Ikeuchi, Katsushi</creator><creatorcontrib>Wake, Naoki ; Kanehira, Atsushi ; Sasabuchi, Kazuhiro ; Takamatsu, Jun ; Ikeuchi, Katsushi</creatorcontrib><description>This paper introduces a novel method for translating natural-language instructions into executable robot actions using OpenAI's ChatGPT in a few-shot setting. We propose customizable input prompts for ChatGPT that can easily integrate with robot execution systems or visual recognition programs, adapt to various environments, and create multi-step task plans while mitigating the impact of token limit imposed on ChatGPT. In our approach, ChatGPT receives both instructions and textual environmental data, and outputs a task plan and an updated environment. These environmental data are reused in subsequent task planning, thus eliminating the extensive record-keeping of prior task plans within the prompts of ChatGPT. Experimental results demonstrated the effectiveness of these prompts across various domestic environments, such as manipulations in front of a shelf, a fridge, and a drawer. The conversational capability of ChatGPT allows users to adjust the output via natural-language feedback. Additionally, a quantitative evaluation using VirtualHome showed that our results are comparable to previous studies. Specifically, 36% of task planning met both executability and correctness, and the rate approached 100% after several rounds of feedback. Our experiments revealed that ChatGPT can reasonably plan tasks and estimate post-operation environments without actual experience in object manipulation. Despite the allure of ChatGPT-based task planning in robotics, a standardized methodology remains elusive, making our work a substantial contribution. These prompts can serve as customizable templates, offering practical resources for the robotics research community. Our prompts and source code are open source and publicly available at https://github.com/microsoft/ChatGPT-Robot-Manipulation-Prompts .</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2023.3310935</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Chatbots ; ChatGPT ; Feedback ; large language models ; Natural language processing ; Oral communication ; Planning ; Robot control ; robot manipulation ; Robotics ; Robots ; Source code ; Task analysis ; Task planning ; Task planning (robotics) ; Translating ; Visualization</subject><ispartof>IEEE access, 2023, Vol.11, p.95060-95078</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c409t-6beb3d7cb1b4361c6a26c7de3c852617663131808fd1b836ccc652d768ebb00e3</citedby><cites>FETCH-LOGICAL-c409t-6beb3d7cb1b4361c6a26c7de3c852617663131808fd1b836ccc652d768ebb00e3</cites><orcidid>0000-0001-9758-9357 ; 0000-0001-7457-2878 ; 0000-0001-8278-2373 ; 0000-0002-5408-3089</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10235949$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Wake, Naoki</creatorcontrib><creatorcontrib>Kanehira, Atsushi</creatorcontrib><creatorcontrib>Sasabuchi, Kazuhiro</creatorcontrib><creatorcontrib>Takamatsu, Jun</creatorcontrib><creatorcontrib>Ikeuchi, Katsushi</creatorcontrib><title>ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application</title><title>IEEE access</title><addtitle>Access</addtitle><description>This paper introduces a novel method for translating natural-language instructions into executable robot actions using OpenAI's ChatGPT in a few-shot setting. We propose customizable input prompts for ChatGPT that can easily integrate with robot execution systems or visual recognition programs, adapt to various environments, and create multi-step task plans while mitigating the impact of token limit imposed on ChatGPT. In our approach, ChatGPT receives both instructions and textual environmental data, and outputs a task plan and an updated environment. These environmental data are reused in subsequent task planning, thus eliminating the extensive record-keeping of prior task plans within the prompts of ChatGPT. Experimental results demonstrated the effectiveness of these prompts across various domestic environments, such as manipulations in front of a shelf, a fridge, and a drawer. The conversational capability of ChatGPT allows users to adjust the output via natural-language feedback. Additionally, a quantitative evaluation using VirtualHome showed that our results are comparable to previous studies. Specifically, 36% of task planning met both executability and correctness, and the rate approached 100% after several rounds of feedback. Our experiments revealed that ChatGPT can reasonably plan tasks and estimate post-operation environments without actual experience in object manipulation. Despite the allure of ChatGPT-based task planning in robotics, a standardized methodology remains elusive, making our work a substantial contribution. These prompts can serve as customizable templates, offering practical resources for the robotics research community. Our prompts and source code are open source and publicly available at https://github.com/microsoft/ChatGPT-Robot-Manipulation-Prompts .</description><subject>Chatbots</subject><subject>ChatGPT</subject><subject>Feedback</subject><subject>large language models</subject><subject>Natural language processing</subject><subject>Oral communication</subject><subject>Planning</subject><subject>Robot control</subject><subject>robot manipulation</subject><subject>Robotics</subject><subject>Robots</subject><subject>Source code</subject><subject>Task analysis</subject><subject>Task planning</subject><subject>Task planning (robotics)</subject><subject>Translating</subject><subject>Visualization</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUU1LxDAQLaKgqL9ADwHPXZNMm6belrKuCwuKq15DvqpZdpuaZBX_vV0r4lxmeLz3ZpiXZRcETwjB9fW0aWar1YRiChOAAYHyIDuhhNU5lMAO_83H2XmMazwUH6CyOsmemzeZ5g9PaLbt_acN1qCl717zVbI9evTKJ9T4LgW_Qa5DLzI4v4to1n244Lut7VK8QVPUyGjRtO83TsvkfHeWHbVyE-35bz_Nnm9nT81dvryfL5rpMtcFrlPOlFVgKq2IKoARzSRlujIWNC8pIxVjQIBwzFtDFAemtWYlNRXjVimMLZxmi9HXeLkWfXBbGb6El078AD68ChmS0xsrzPAZDsSUtGZFVUrJi5YYqUytW80MHbyuRq8--PedjUms_S50w_mCckYZpUD2LBhZOvgYg23_thIs9nGIMQ6xj0P8xjGoLkeVs9b-U1Ao66KGb6SuhQg</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Wake, Naoki</creator><creator>Kanehira, Atsushi</creator><creator>Sasabuchi, Kazuhiro</creator><creator>Takamatsu, Jun</creator><creator>Ikeuchi, Katsushi</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-9758-9357</orcidid><orcidid>https://orcid.org/0000-0001-7457-2878</orcidid><orcidid>https://orcid.org/0000-0001-8278-2373</orcidid><orcidid>https://orcid.org/0000-0002-5408-3089</orcidid></search><sort><creationdate>2023</creationdate><title>ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application</title><author>Wake, Naoki ; Kanehira, Atsushi ; Sasabuchi, Kazuhiro ; Takamatsu, Jun ; Ikeuchi, Katsushi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c409t-6beb3d7cb1b4361c6a26c7de3c852617663131808fd1b836ccc652d768ebb00e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Chatbots</topic><topic>ChatGPT</topic><topic>Feedback</topic><topic>large language models</topic><topic>Natural language processing</topic><topic>Oral communication</topic><topic>Planning</topic><topic>Robot control</topic><topic>robot manipulation</topic><topic>Robotics</topic><topic>Robots</topic><topic>Source code</topic><topic>Task analysis</topic><topic>Task planning</topic><topic>Task planning (robotics)</topic><topic>Translating</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wake, Naoki</creatorcontrib><creatorcontrib>Kanehira, Atsushi</creatorcontrib><creatorcontrib>Sasabuchi, Kazuhiro</creatorcontrib><creatorcontrib>Takamatsu, Jun</creatorcontrib><creatorcontrib>Ikeuchi, Katsushi</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wake, Naoki</au><au>Kanehira, Atsushi</au><au>Sasabuchi, Kazuhiro</au><au>Takamatsu, Jun</au><au>Ikeuchi, Katsushi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2023</date><risdate>2023</risdate><volume>11</volume><spage>95060</spage><epage>95078</epage><pages>95060-95078</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>This paper introduces a novel method for translating natural-language instructions into executable robot actions using OpenAI's ChatGPT in a few-shot setting. We propose customizable input prompts for ChatGPT that can easily integrate with robot execution systems or visual recognition programs, adapt to various environments, and create multi-step task plans while mitigating the impact of token limit imposed on ChatGPT. In our approach, ChatGPT receives both instructions and textual environmental data, and outputs a task plan and an updated environment. These environmental data are reused in subsequent task planning, thus eliminating the extensive record-keeping of prior task plans within the prompts of ChatGPT. Experimental results demonstrated the effectiveness of these prompts across various domestic environments, such as manipulations in front of a shelf, a fridge, and a drawer. The conversational capability of ChatGPT allows users to adjust the output via natural-language feedback. Additionally, a quantitative evaluation using VirtualHome showed that our results are comparable to previous studies. Specifically, 36% of task planning met both executability and correctness, and the rate approached 100% after several rounds of feedback. Our experiments revealed that ChatGPT can reasonably plan tasks and estimate post-operation environments without actual experience in object manipulation. Despite the allure of ChatGPT-based task planning in robotics, a standardized methodology remains elusive, making our work a substantial contribution. These prompts can serve as customizable templates, offering practical resources for the robotics research community. Our prompts and source code are open source and publicly available at https://github.com/microsoft/ChatGPT-Robot-Manipulation-Prompts .</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2023.3310935</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0001-9758-9357</orcidid><orcidid>https://orcid.org/0000-0001-7457-2878</orcidid><orcidid>https://orcid.org/0000-0001-8278-2373</orcidid><orcidid>https://orcid.org/0000-0002-5408-3089</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2169-3536
ispartof	IEEE access, 2023, Vol.11, p.95060-95078
issn	2169-3536 2169-3536
language	eng
recordid	cdi_crossref_primary_10_1109_ACCESS_2023_3310935
source	IEEE Open Access Journals; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals
subjects	Chatbots ChatGPT Feedback large language models Natural language processing Oral communication Planning Robot control robot manipulation Robotics Robots Source code Task analysis Task planning Task planning (robotics) Translating Visualization
title	ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T16%3A18%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ChatGPT%20Empowered%20Long-Step%20Robot%20Control%20in%20Various%20Environments:%20A%20Case%20Application&rft.jtitle=IEEE%20access&rft.au=Wake,%20Naoki&rft.date=2023&rft.volume=11&rft.spage=95060&rft.epage=95078&rft.pages=95060-95078&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2023.3310935&rft_dat=%3Cproquest_cross%3E2862622312%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2862622312&rft_id=info:pmid/&rft_ieee_id=10235949&rft_doaj_id=oai_doaj_org_article_d310831d5296475aa84f1dabd9cfc6d2&rfr_iscdi=true