ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application

This paper introduces a novel method for translating natural-language instructions into executable robot actions using OpenAI's ChatGPT in a few-shot setting. We propose customizable input prompts for ChatGPT that can easily integrate with robot execution systems or visual recognition programs,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2023, Vol.11, p.95060-95078
Hauptverfasser: Wake, Naoki, Kanehira, Atsushi, Sasabuchi, Kazuhiro, Takamatsu, Jun, Ikeuchi, Katsushi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 95078
container_issue
container_start_page 95060
container_title IEEE access
container_volume 11
creator Wake, Naoki
Kanehira, Atsushi
Sasabuchi, Kazuhiro
Takamatsu, Jun
Ikeuchi, Katsushi
description This paper introduces a novel method for translating natural-language instructions into executable robot actions using OpenAI's ChatGPT in a few-shot setting. We propose customizable input prompts for ChatGPT that can easily integrate with robot execution systems or visual recognition programs, adapt to various environments, and create multi-step task plans while mitigating the impact of token limit imposed on ChatGPT. In our approach, ChatGPT receives both instructions and textual environmental data, and outputs a task plan and an updated environment. These environmental data are reused in subsequent task planning, thus eliminating the extensive record-keeping of prior task plans within the prompts of ChatGPT. Experimental results demonstrated the effectiveness of these prompts across various domestic environments, such as manipulations in front of a shelf, a fridge, and a drawer. The conversational capability of ChatGPT allows users to adjust the output via natural-language feedback. Additionally, a quantitative evaluation using VirtualHome showed that our results are comparable to previous studies. Specifically, 36% of task planning met both executability and correctness, and the rate approached 100% after several rounds of feedback. Our experiments revealed that ChatGPT can reasonably plan tasks and estimate post-operation environments without actual experience in object manipulation. Despite the allure of ChatGPT-based task planning in robotics, a standardized methodology remains elusive, making our work a substantial contribution. These prompts can serve as customizable templates, offering practical resources for the robotics research community. Our prompts and source code are open source and publicly available at https://github.com/microsoft/ChatGPT-Robot-Manipulation-Prompts .
doi_str_mv 10.1109/ACCESS.2023.3310935
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_ACCESS_2023_3310935</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10235949</ieee_id><doaj_id>oai_doaj_org_article_d310831d5296475aa84f1dabd9cfc6d2</doaj_id><sourcerecordid>2862622312</sourcerecordid><originalsourceid>FETCH-LOGICAL-c409t-6beb3d7cb1b4361c6a26c7de3c852617663131808fd1b836ccc652d768ebb00e3</originalsourceid><addsrcrecordid>eNpNUU1LxDAQLaKgqL9ADwHPXZNMm6belrKuCwuKq15DvqpZdpuaZBX_vV0r4lxmeLz3ZpiXZRcETwjB9fW0aWar1YRiChOAAYHyIDuhhNU5lMAO_83H2XmMazwUH6CyOsmemzeZ5g9PaLbt_acN1qCl717zVbI9evTKJ9T4LgW_Qa5DLzI4v4to1n244Lut7VK8QVPUyGjRtO83TsvkfHeWHbVyE-35bz_Nnm9nT81dvryfL5rpMtcFrlPOlFVgKq2IKoARzSRlujIWNC8pIxVjQIBwzFtDFAemtWYlNRXjVimMLZxmi9HXeLkWfXBbGb6El078AD68ChmS0xsrzPAZDsSUtGZFVUrJi5YYqUytW80MHbyuRq8--PedjUms_S50w_mCckYZpUD2LBhZOvgYg23_thIs9nGIMQ6xj0P8xjGoLkeVs9b-U1Ao66KGb6SuhQg</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2862622312</pqid></control><display><type>article</type><title>ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Wake, Naoki ; Kanehira, Atsushi ; Sasabuchi, Kazuhiro ; Takamatsu, Jun ; Ikeuchi, Katsushi</creator><creatorcontrib>Wake, Naoki ; Kanehira, Atsushi ; Sasabuchi, Kazuhiro ; Takamatsu, Jun ; Ikeuchi, Katsushi</creatorcontrib><description>This paper introduces a novel method for translating natural-language instructions into executable robot actions using OpenAI's ChatGPT in a few-shot setting. We propose customizable input prompts for ChatGPT that can easily integrate with robot execution systems or visual recognition programs, adapt to various environments, and create multi-step task plans while mitigating the impact of token limit imposed on ChatGPT. In our approach, ChatGPT receives both instructions and textual environmental data, and outputs a task plan and an updated environment. These environmental data are reused in subsequent task planning, thus eliminating the extensive record-keeping of prior task plans within the prompts of ChatGPT. Experimental results demonstrated the effectiveness of these prompts across various domestic environments, such as manipulations in front of a shelf, a fridge, and a drawer. The conversational capability of ChatGPT allows users to adjust the output via natural-language feedback. Additionally, a quantitative evaluation using VirtualHome showed that our results are comparable to previous studies. Specifically, 36% of task planning met both executability and correctness, and the rate approached 100% after several rounds of feedback. Our experiments revealed that ChatGPT can reasonably plan tasks and estimate post-operation environments without actual experience in object manipulation. Despite the allure of ChatGPT-based task planning in robotics, a standardized methodology remains elusive, making our work a substantial contribution. These prompts can serve as customizable templates, offering practical resources for the robotics research community. Our prompts and source code are open source and publicly available at https://github.com/microsoft/ChatGPT-Robot-Manipulation-Prompts .</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2023.3310935</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Chatbots ; ChatGPT ; Feedback ; large language models ; Natural language processing ; Oral communication ; Planning ; Robot control ; robot manipulation ; Robotics ; Robots ; Source code ; Task analysis ; Task planning ; Task planning (robotics) ; Translating ; Visualization</subject><ispartof>IEEE access, 2023, Vol.11, p.95060-95078</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c409t-6beb3d7cb1b4361c6a26c7de3c852617663131808fd1b836ccc652d768ebb00e3</citedby><cites>FETCH-LOGICAL-c409t-6beb3d7cb1b4361c6a26c7de3c852617663131808fd1b836ccc652d768ebb00e3</cites><orcidid>0000-0001-9758-9357 ; 0000-0001-7457-2878 ; 0000-0001-8278-2373 ; 0000-0002-5408-3089</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10235949$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Wake, Naoki</creatorcontrib><creatorcontrib>Kanehira, Atsushi</creatorcontrib><creatorcontrib>Sasabuchi, Kazuhiro</creatorcontrib><creatorcontrib>Takamatsu, Jun</creatorcontrib><creatorcontrib>Ikeuchi, Katsushi</creatorcontrib><title>ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application</title><title>IEEE access</title><addtitle>Access</addtitle><description>This paper introduces a novel method for translating natural-language instructions into executable robot actions using OpenAI's ChatGPT in a few-shot setting. We propose customizable input prompts for ChatGPT that can easily integrate with robot execution systems or visual recognition programs, adapt to various environments, and create multi-step task plans while mitigating the impact of token limit imposed on ChatGPT. In our approach, ChatGPT receives both instructions and textual environmental data, and outputs a task plan and an updated environment. These environmental data are reused in subsequent task planning, thus eliminating the extensive record-keeping of prior task plans within the prompts of ChatGPT. Experimental results demonstrated the effectiveness of these prompts across various domestic environments, such as manipulations in front of a shelf, a fridge, and a drawer. The conversational capability of ChatGPT allows users to adjust the output via natural-language feedback. Additionally, a quantitative evaluation using VirtualHome showed that our results are comparable to previous studies. Specifically, 36% of task planning met both executability and correctness, and the rate approached 100% after several rounds of feedback. Our experiments revealed that ChatGPT can reasonably plan tasks and estimate post-operation environments without actual experience in object manipulation. Despite the allure of ChatGPT-based task planning in robotics, a standardized methodology remains elusive, making our work a substantial contribution. These prompts can serve as customizable templates, offering practical resources for the robotics research community. Our prompts and source code are open source and publicly available at https://github.com/microsoft/ChatGPT-Robot-Manipulation-Prompts .</description><subject>Chatbots</subject><subject>ChatGPT</subject><subject>Feedback</subject><subject>large language models</subject><subject>Natural language processing</subject><subject>Oral communication</subject><subject>Planning</subject><subject>Robot control</subject><subject>robot manipulation</subject><subject>Robotics</subject><subject>Robots</subject><subject>Source code</subject><subject>Task analysis</subject><subject>Task planning</subject><subject>Task planning (robotics)</subject><subject>Translating</subject><subject>Visualization</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUU1LxDAQLaKgqL9ADwHPXZNMm6belrKuCwuKq15DvqpZdpuaZBX_vV0r4lxmeLz3ZpiXZRcETwjB9fW0aWar1YRiChOAAYHyIDuhhNU5lMAO_83H2XmMazwUH6CyOsmemzeZ5g9PaLbt_acN1qCl717zVbI9evTKJ9T4LgW_Qa5DLzI4v4to1n244Lut7VK8QVPUyGjRtO83TsvkfHeWHbVyE-35bz_Nnm9nT81dvryfL5rpMtcFrlPOlFVgKq2IKoARzSRlujIWNC8pIxVjQIBwzFtDFAemtWYlNRXjVimMLZxmi9HXeLkWfXBbGb6El078AD68ChmS0xsrzPAZDsSUtGZFVUrJi5YYqUytW80MHbyuRq8--PedjUms_S50w_mCckYZpUD2LBhZOvgYg23_thIs9nGIMQ6xj0P8xjGoLkeVs9b-U1Ao66KGb6SuhQg</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Wake, Naoki</creator><creator>Kanehira, Atsushi</creator><creator>Sasabuchi, Kazuhiro</creator><creator>Takamatsu, Jun</creator><creator>Ikeuchi, Katsushi</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-9758-9357</orcidid><orcidid>https://orcid.org/0000-0001-7457-2878</orcidid><orcidid>https://orcid.org/0000-0001-8278-2373</orcidid><orcidid>https://orcid.org/0000-0002-5408-3089</orcidid></search><sort><creationdate>2023</creationdate><title>ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application</title><author>Wake, Naoki ; Kanehira, Atsushi ; Sasabuchi, Kazuhiro ; Takamatsu, Jun ; Ikeuchi, Katsushi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c409t-6beb3d7cb1b4361c6a26c7de3c852617663131808fd1b836ccc652d768ebb00e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Chatbots</topic><topic>ChatGPT</topic><topic>Feedback</topic><topic>large language models</topic><topic>Natural language processing</topic><topic>Oral communication</topic><topic>Planning</topic><topic>Robot control</topic><topic>robot manipulation</topic><topic>Robotics</topic><topic>Robots</topic><topic>Source code</topic><topic>Task analysis</topic><topic>Task planning</topic><topic>Task planning (robotics)</topic><topic>Translating</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wake, Naoki</creatorcontrib><creatorcontrib>Kanehira, Atsushi</creatorcontrib><creatorcontrib>Sasabuchi, Kazuhiro</creatorcontrib><creatorcontrib>Takamatsu, Jun</creatorcontrib><creatorcontrib>Ikeuchi, Katsushi</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wake, Naoki</au><au>Kanehira, Atsushi</au><au>Sasabuchi, Kazuhiro</au><au>Takamatsu, Jun</au><au>Ikeuchi, Katsushi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2023</date><risdate>2023</risdate><volume>11</volume><spage>95060</spage><epage>95078</epage><pages>95060-95078</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>This paper introduces a novel method for translating natural-language instructions into executable robot actions using OpenAI's ChatGPT in a few-shot setting. We propose customizable input prompts for ChatGPT that can easily integrate with robot execution systems or visual recognition programs, adapt to various environments, and create multi-step task plans while mitigating the impact of token limit imposed on ChatGPT. In our approach, ChatGPT receives both instructions and textual environmental data, and outputs a task plan and an updated environment. These environmental data are reused in subsequent task planning, thus eliminating the extensive record-keeping of prior task plans within the prompts of ChatGPT. Experimental results demonstrated the effectiveness of these prompts across various domestic environments, such as manipulations in front of a shelf, a fridge, and a drawer. The conversational capability of ChatGPT allows users to adjust the output via natural-language feedback. Additionally, a quantitative evaluation using VirtualHome showed that our results are comparable to previous studies. Specifically, 36% of task planning met both executability and correctness, and the rate approached 100% after several rounds of feedback. Our experiments revealed that ChatGPT can reasonably plan tasks and estimate post-operation environments without actual experience in object manipulation. Despite the allure of ChatGPT-based task planning in robotics, a standardized methodology remains elusive, making our work a substantial contribution. These prompts can serve as customizable templates, offering practical resources for the robotics research community. Our prompts and source code are open source and publicly available at https://github.com/microsoft/ChatGPT-Robot-Manipulation-Prompts .</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2023.3310935</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0001-9758-9357</orcidid><orcidid>https://orcid.org/0000-0001-7457-2878</orcidid><orcidid>https://orcid.org/0000-0001-8278-2373</orcidid><orcidid>https://orcid.org/0000-0002-5408-3089</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2023, Vol.11, p.95060-95078
issn 2169-3536
2169-3536
language eng
recordid cdi_crossref_primary_10_1109_ACCESS_2023_3310935
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals
subjects Chatbots
ChatGPT
Feedback
large language models
Natural language processing
Oral communication
Planning
Robot control
robot manipulation
Robotics
Robots
Source code
Task analysis
Task planning
Task planning (robotics)
Translating
Visualization
title ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T16%3A18%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ChatGPT%20Empowered%20Long-Step%20Robot%20Control%20in%20Various%20Environments:%20A%20Case%20Application&rft.jtitle=IEEE%20access&rft.au=Wake,%20Naoki&rft.date=2023&rft.volume=11&rft.spage=95060&rft.epage=95078&rft.pages=95060-95078&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2023.3310935&rft_dat=%3Cproquest_cross%3E2862622312%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2862622312&rft_id=info:pmid/&rft_ieee_id=10235949&rft_doaj_id=oai_doaj_org_article_d310831d5296475aa84f1dabd9cfc6d2&rfr_iscdi=true