GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning

Recent advances in text-to-video generation have harnessed the power of diffusion models to create visually compelling content conditioned on text prompts. However, they usually encounter high computational costs and often struggle to produce videos with coherent physical motions. To tackle these is...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-04
Hauptverfasser:	Lv, Jiaxi, Huang, Yi, Yan, Mingfu, Huang, Jiancheng, Liu, Jianzhuang, Liu, Yifan, Wen, Yafei, Chen, Xiaoxin, Chen, Shifeng
Format:	Artikel
Sprache:	eng
Schlagworte:	Coherence Image enhancement Image processing Image quality Large language models Liquid flow Physical simulation Video
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Lv, Jiaxi Huang, Yi Yan, Mingfu Huang, Jiancheng Liu, Jianzhuang Liu, Yifan Wen, Yafei Chen, Xiaoxin Chen, Shifeng
description	Recent advances in text-to-video generation have harnessed the power of diffusion models to create visually compelling content conditioned on text prompts. However, they usually encounter high computational costs and often struggle to produce videos with coherent physical motions. To tackle these issues, we propose GPT4Motion, a training-free framework that leverages the planning capability of large language models such as GPT, the physical simulation strength of Blender, and the excellent image generation ability of text-to-image diffusion models to enhance the quality of video synthesis. Specifically, GPT4Motion employs GPT-4 to generate a Blender script based on a user textual prompt, which commands Blender's built-in physics engine to craft fundamental scene components that encapsulate coherent physical motions across frames. Then these components are inputted into Stable Diffusion to generate a video aligned with the textual prompt. Experimental results on three basic physical motion scenarios, including rigid object drop and collision, cloth draping and swinging, and liquid flow, demonstrate that GPT4Motion can generate high-quality videos efficiently in maintaining motion coherency and entity consistency. GPT4Motion offers new insights in text-to-video research, enhancing its quality and broadening its horizon for further explorations.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2892395789</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2892395789</sourcerecordid><originalsourceid>FETCH-proquest_journals_28923957893</originalsourceid><addsrcrecordid>eNqNzEEKwjAQheEgCIp6hwHXgZq0tnWpaN2IBYvbEuyoKWWiSSp6eyt6AFdv8T3-HhsKKWc8CYUYsIlzdRAEYh6LKJJDVmV5Ee6M14YWcDhZffOaLpBfX06fVANfcqAJCnx67g0_6goNZEho1QfhoRUsG6QKLd9bjeSxgq4LeaOIutyY9c-qcTj57YhNN-titeU3a-4tOl_WprXUUSmSVMg0ipNU_vd6A5WARY4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2892395789</pqid></control><display><type>article</type><title>GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning</title><source>Freely Accessible Journals</source><creator>Lv, Jiaxi ; Huang, Yi ; Yan, Mingfu ; Huang, Jiancheng ; Liu, Jianzhuang ; Liu, Yifan ; Wen, Yafei ; Chen, Xiaoxin ; Chen, Shifeng</creator><creatorcontrib>Lv, Jiaxi ; Huang, Yi ; Yan, Mingfu ; Huang, Jiancheng ; Liu, Jianzhuang ; Liu, Yifan ; Wen, Yafei ; Chen, Xiaoxin ; Chen, Shifeng</creatorcontrib><description>Recent advances in text-to-video generation have harnessed the power of diffusion models to create visually compelling content conditioned on text prompts. However, they usually encounter high computational costs and often struggle to produce videos with coherent physical motions. To tackle these issues, we propose GPT4Motion, a training-free framework that leverages the planning capability of large language models such as GPT, the physical simulation strength of Blender, and the excellent image generation ability of text-to-image diffusion models to enhance the quality of video synthesis. Specifically, GPT4Motion employs GPT-4 to generate a Blender script based on a user textual prompt, which commands Blender's built-in physics engine to craft fundamental scene components that encapsulate coherent physical motions across frames. Then these components are inputted into Stable Diffusion to generate a video aligned with the textual prompt. Experimental results on three basic physical motion scenarios, including rigid object drop and collision, cloth draping and swinging, and liquid flow, demonstrate that GPT4Motion can generate high-quality videos efficiently in maintaining motion coherency and entity consistency. GPT4Motion offers new insights in text-to-video research, enhancing its quality and broadening its horizon for further explorations.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Coherence ; Image enhancement ; Image processing ; Image quality ; Large language models ; Liquid flow ; Physical simulation ; Video</subject><ispartof>arXiv.org, 2024-04</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Lv, Jiaxi</creatorcontrib><creatorcontrib>Huang, Yi</creatorcontrib><creatorcontrib>Yan, Mingfu</creatorcontrib><creatorcontrib>Huang, Jiancheng</creatorcontrib><creatorcontrib>Liu, Jianzhuang</creatorcontrib><creatorcontrib>Liu, Yifan</creatorcontrib><creatorcontrib>Wen, Yafei</creatorcontrib><creatorcontrib>Chen, Xiaoxin</creatorcontrib><creatorcontrib>Chen, Shifeng</creatorcontrib><title>GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning</title><title>arXiv.org</title><description>Recent advances in text-to-video generation have harnessed the power of diffusion models to create visually compelling content conditioned on text prompts. However, they usually encounter high computational costs and often struggle to produce videos with coherent physical motions. To tackle these issues, we propose GPT4Motion, a training-free framework that leverages the planning capability of large language models such as GPT, the physical simulation strength of Blender, and the excellent image generation ability of text-to-image diffusion models to enhance the quality of video synthesis. Specifically, GPT4Motion employs GPT-4 to generate a Blender script based on a user textual prompt, which commands Blender's built-in physics engine to craft fundamental scene components that encapsulate coherent physical motions across frames. Then these components are inputted into Stable Diffusion to generate a video aligned with the textual prompt. Experimental results on three basic physical motion scenarios, including rigid object drop and collision, cloth draping and swinging, and liquid flow, demonstrate that GPT4Motion can generate high-quality videos efficiently in maintaining motion coherency and entity consistency. GPT4Motion offers new insights in text-to-video research, enhancing its quality and broadening its horizon for further explorations.</description><subject>Coherence</subject><subject>Image enhancement</subject><subject>Image processing</subject><subject>Image quality</subject><subject>Large language models</subject><subject>Liquid flow</subject><subject>Physical simulation</subject><subject>Video</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNzEEKwjAQheEgCIp6hwHXgZq0tnWpaN2IBYvbEuyoKWWiSSp6eyt6AFdv8T3-HhsKKWc8CYUYsIlzdRAEYh6LKJJDVmV5Ee6M14YWcDhZffOaLpBfX06fVANfcqAJCnx67g0_6goNZEho1QfhoRUsG6QKLd9bjeSxgq4LeaOIutyY9c-qcTj57YhNN-titeU3a-4tOl_WprXUUSmSVMg0ipNU_vd6A5WARY4</recordid><startdate>20240423</startdate><enddate>20240423</enddate><creator>Lv, Jiaxi</creator><creator>Huang, Yi</creator><creator>Yan, Mingfu</creator><creator>Huang, Jiancheng</creator><creator>Liu, Jianzhuang</creator><creator>Liu, Yifan</creator><creator>Wen, Yafei</creator><creator>Chen, Xiaoxin</creator><creator>Chen, Shifeng</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240423</creationdate><title>GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning</title><author>Lv, Jiaxi ; Huang, Yi ; Yan, Mingfu ; Huang, Jiancheng ; Liu, Jianzhuang ; Liu, Yifan ; Wen, Yafei ; Chen, Xiaoxin ; Chen, Shifeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28923957893</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Coherence</topic><topic>Image enhancement</topic><topic>Image processing</topic><topic>Image quality</topic><topic>Large language models</topic><topic>Liquid flow</topic><topic>Physical simulation</topic><topic>Video</topic><toplevel>online_resources</toplevel><creatorcontrib>Lv, Jiaxi</creatorcontrib><creatorcontrib>Huang, Yi</creatorcontrib><creatorcontrib>Yan, Mingfu</creatorcontrib><creatorcontrib>Huang, Jiancheng</creatorcontrib><creatorcontrib>Liu, Jianzhuang</creatorcontrib><creatorcontrib>Liu, Yifan</creatorcontrib><creatorcontrib>Wen, Yafei</creatorcontrib><creatorcontrib>Chen, Xiaoxin</creatorcontrib><creatorcontrib>Chen, Shifeng</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lv, Jiaxi</au><au>Huang, Yi</au><au>Yan, Mingfu</au><au>Huang, Jiancheng</au><au>Liu, Jianzhuang</au><au>Liu, Yifan</au><au>Wen, Yafei</au><au>Chen, Xiaoxin</au><au>Chen, Shifeng</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning</atitle><jtitle>arXiv.org</jtitle><date>2024-04-23</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Recent advances in text-to-video generation have harnessed the power of diffusion models to create visually compelling content conditioned on text prompts. However, they usually encounter high computational costs and often struggle to produce videos with coherent physical motions. To tackle these issues, we propose GPT4Motion, a training-free framework that leverages the planning capability of large language models such as GPT, the physical simulation strength of Blender, and the excellent image generation ability of text-to-image diffusion models to enhance the quality of video synthesis. Specifically, GPT4Motion employs GPT-4 to generate a Blender script based on a user textual prompt, which commands Blender's built-in physics engine to craft fundamental scene components that encapsulate coherent physical motions across frames. Then these components are inputted into Stable Diffusion to generate a video aligned with the textual prompt. Experimental results on three basic physical motion scenarios, including rigid object drop and collision, cloth draping and swinging, and liquid flow, demonstrate that GPT4Motion can generate high-quality videos efficiently in maintaining motion coherency and entity consistency. GPT4Motion offers new insights in text-to-video research, enhancing its quality and broadening its horizon for further explorations.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-04
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2892395789
source	Freely Accessible Journals
subjects	Coherence Image enhancement Image processing Image quality Large language models Liquid flow Physical simulation Video
title	GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-21T10%3A12%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=GPT4Motion:%20Scripting%20Physical%20Motions%20in%20Text-to-Video%20Generation%20via%20Blender-Oriented%20GPT%20Planning&rft.jtitle=arXiv.org&rft.au=Lv,%20Jiaxi&rft.date=2024-04-23&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2892395789%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2892395789&rft_id=info:pmid/&rfr_iscdi=true