LEGENT: Open Platform for Embodied Agents
Despite advancements in Large Language Models (LLMs) and Large Multimodal Models (LMMs), their integration into language-grounded, human-like embodied agents remains incomplete, hindering complex real-life task performance in physical environments. Existing integrations often feature limited open so...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Cheng, Zhili Wang, Zhitong Hu, Jinyi Hu, Shengding Liu, An Tu, Yuge Li, Pengkai Shi, Lei Liu, Zhiyuan Sun, Maosong |
description | Despite advancements in Large Language Models (LLMs) and Large Multimodal
Models (LMMs), their integration into language-grounded, human-like embodied
agents remains incomplete, hindering complex real-life task performance in
physical environments. Existing integrations often feature limited open
sourcing, challenging collective progress in this field. We introduce LEGENT,
an open, scalable platform for developing embodied agents using LLMs and LMMs.
LEGENT offers a dual approach: a rich, interactive 3D environment with
communicable and actionable agents, paired with a user-friendly interface, and
a sophisticated data generation pipeline utilizing advanced algorithms to
exploit supervision from simulated worlds at scale. In our experiments, an
embryonic vision-language-action model trained on LEGENT-generated data
surpasses GPT-4V in embodied tasks, showcasing promising generalization
capabilities. |
doi_str_mv | 10.48550/arxiv.2404.18243 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2404_18243</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2404_18243</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-98bf8090606d7ec3a388e5ba7a7d653c60b754b24ca7d10554b87ebc898754763</originalsourceid><addsrcrecordid>eNotjjsPgjAYRbs4GPUHONnVASz09eFGCD4Sog7spKXFkACaQoz-e_Gx3HtzhpuD0DIgPgPOyUa5Z_3wQ0aYH0DI6BSts3SfnvItPt9thy-NGqqba_EYOG31zdTW4Phqu6Gfo0mlmt4u_j1D-S7Nk4OXnffHJM48JST1ItAVkIgIIoy0JVUUwHKtpJJGcFoKoiVnOmTlCALCxw3S6hIiGLkUdIZWv9uva3F3davcq_g4F19n-gYsUjln</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>LEGENT: Open Platform for Embodied Agents</title><source>arXiv.org</source><creator>Cheng, Zhili ; Wang, Zhitong ; Hu, Jinyi ; Hu, Shengding ; Liu, An ; Tu, Yuge ; Li, Pengkai ; Shi, Lei ; Liu, Zhiyuan ; Sun, Maosong</creator><creatorcontrib>Cheng, Zhili ; Wang, Zhitong ; Hu, Jinyi ; Hu, Shengding ; Liu, An ; Tu, Yuge ; Li, Pengkai ; Shi, Lei ; Liu, Zhiyuan ; Sun, Maosong</creatorcontrib><description>Despite advancements in Large Language Models (LLMs) and Large Multimodal
Models (LMMs), their integration into language-grounded, human-like embodied
agents remains incomplete, hindering complex real-life task performance in
physical environments. Existing integrations often feature limited open
sourcing, challenging collective progress in this field. We introduce LEGENT,
an open, scalable platform for developing embodied agents using LLMs and LMMs.
LEGENT offers a dual approach: a rich, interactive 3D environment with
communicable and actionable agents, paired with a user-friendly interface, and
a sophisticated data generation pipeline utilizing advanced algorithms to
exploit supervision from simulated worlds at scale. In our experiments, an
embryonic vision-language-action model trained on LEGENT-generated data
surpasses GPT-4V in embodied tasks, showcasing promising generalization
capabilities.</description><identifier>DOI: 10.48550/arxiv.2404.18243</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Learning ; Computer Science - Robotics</subject><creationdate>2024-04</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2404.18243$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2404.18243$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Cheng, Zhili</creatorcontrib><creatorcontrib>Wang, Zhitong</creatorcontrib><creatorcontrib>Hu, Jinyi</creatorcontrib><creatorcontrib>Hu, Shengding</creatorcontrib><creatorcontrib>Liu, An</creatorcontrib><creatorcontrib>Tu, Yuge</creatorcontrib><creatorcontrib>Li, Pengkai</creatorcontrib><creatorcontrib>Shi, Lei</creatorcontrib><creatorcontrib>Liu, Zhiyuan</creatorcontrib><creatorcontrib>Sun, Maosong</creatorcontrib><title>LEGENT: Open Platform for Embodied Agents</title><description>Despite advancements in Large Language Models (LLMs) and Large Multimodal
Models (LMMs), their integration into language-grounded, human-like embodied
agents remains incomplete, hindering complex real-life task performance in
physical environments. Existing integrations often feature limited open
sourcing, challenging collective progress in this field. We introduce LEGENT,
an open, scalable platform for developing embodied agents using LLMs and LMMs.
LEGENT offers a dual approach: a rich, interactive 3D environment with
communicable and actionable agents, paired with a user-friendly interface, and
a sophisticated data generation pipeline utilizing advanced algorithms to
exploit supervision from simulated worlds at scale. In our experiments, an
embryonic vision-language-action model trained on LEGENT-generated data
surpasses GPT-4V in embodied tasks, showcasing promising generalization
capabilities.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Learning</subject><subject>Computer Science - Robotics</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotjjsPgjAYRbs4GPUHONnVASz09eFGCD4Sog7spKXFkACaQoz-e_Gx3HtzhpuD0DIgPgPOyUa5Z_3wQ0aYH0DI6BSts3SfnvItPt9thy-NGqqba_EYOG31zdTW4Phqu6Gfo0mlmt4u_j1D-S7Nk4OXnffHJM48JST1ItAVkIgIIoy0JVUUwHKtpJJGcFoKoiVnOmTlCALCxw3S6hIiGLkUdIZWv9uva3F3davcq_g4F19n-gYsUjln</recordid><startdate>20240428</startdate><enddate>20240428</enddate><creator>Cheng, Zhili</creator><creator>Wang, Zhitong</creator><creator>Hu, Jinyi</creator><creator>Hu, Shengding</creator><creator>Liu, An</creator><creator>Tu, Yuge</creator><creator>Li, Pengkai</creator><creator>Shi, Lei</creator><creator>Liu, Zhiyuan</creator><creator>Sun, Maosong</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240428</creationdate><title>LEGENT: Open Platform for Embodied Agents</title><author>Cheng, Zhili ; Wang, Zhitong ; Hu, Jinyi ; Hu, Shengding ; Liu, An ; Tu, Yuge ; Li, Pengkai ; Shi, Lei ; Liu, Zhiyuan ; Sun, Maosong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-98bf8090606d7ec3a388e5ba7a7d653c60b754b24ca7d10554b87ebc898754763</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Learning</topic><topic>Computer Science - Robotics</topic><toplevel>online_resources</toplevel><creatorcontrib>Cheng, Zhili</creatorcontrib><creatorcontrib>Wang, Zhitong</creatorcontrib><creatorcontrib>Hu, Jinyi</creatorcontrib><creatorcontrib>Hu, Shengding</creatorcontrib><creatorcontrib>Liu, An</creatorcontrib><creatorcontrib>Tu, Yuge</creatorcontrib><creatorcontrib>Li, Pengkai</creatorcontrib><creatorcontrib>Shi, Lei</creatorcontrib><creatorcontrib>Liu, Zhiyuan</creatorcontrib><creatorcontrib>Sun, Maosong</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Cheng, Zhili</au><au>Wang, Zhitong</au><au>Hu, Jinyi</au><au>Hu, Shengding</au><au>Liu, An</au><au>Tu, Yuge</au><au>Li, Pengkai</au><au>Shi, Lei</au><au>Liu, Zhiyuan</au><au>Sun, Maosong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>LEGENT: Open Platform for Embodied Agents</atitle><date>2024-04-28</date><risdate>2024</risdate><abstract>Despite advancements in Large Language Models (LLMs) and Large Multimodal
Models (LMMs), their integration into language-grounded, human-like embodied
agents remains incomplete, hindering complex real-life task performance in
physical environments. Existing integrations often feature limited open
sourcing, challenging collective progress in this field. We introduce LEGENT,
an open, scalable platform for developing embodied agents using LLMs and LMMs.
LEGENT offers a dual approach: a rich, interactive 3D environment with
communicable and actionable agents, paired with a user-friendly interface, and
a sophisticated data generation pipeline utilizing advanced algorithms to
exploit supervision from simulated worlds at scale. In our experiments, an
embryonic vision-language-action model trained on LEGENT-generated data
surpasses GPT-4V in embodied tasks, showcasing promising generalization
capabilities.</abstract><doi>10.48550/arxiv.2404.18243</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2404.18243 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2404_18243 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning Computer Science - Robotics |
title | LEGENT: Open Platform for Embodied Agents |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T08%3A40%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=LEGENT:%20Open%20Platform%20for%20Embodied%20Agents&rft.au=Cheng,%20Zhili&rft.date=2024-04-28&rft_id=info:doi/10.48550/arxiv.2404.18243&rft_dat=%3Carxiv_GOX%3E2404_18243%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |