AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production

The Agent and AIGC (Artificial Intelligence Generated Content) technologies have recently made significant progress. We propose AesopAgent, an Agent-driven Evolutionary System on Story-to-Video Production. AesopAgent is a practical application of agent technology for multimodal content generation. T...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Wang, Jiuniu, Du, Zehua, Zhao, Yuyuan, Yuan, Bo, Wang, Kexiang, Liang, Jian, Zhao, Yaxi, Lu, Yihen, Li, Gengliang, Gao, Junlong, Tu, Xin, Guo, Zhenyu
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition Computer Science - Multimedia
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Wang, Jiuniu Du, Zehua Zhao, Yuyuan Yuan, Bo Wang, Kexiang Liang, Jian Zhao, Yaxi Lu, Yihen Li, Gengliang Gao, Junlong Tu, Xin Guo, Zhenyu
description	The Agent and AIGC (Artificial Intelligence Generated Content) technologies have recently made significant progress. We propose AesopAgent, an Agent-driven Evolutionary System on Story-to-Video Production. AesopAgent is a practical application of agent technology for multimodal content generation. The system integrates multiple generative capabilities within a unified framework, so that individual users can leverage these modules easily. This innovative system would convert user story proposals into scripts, images, and audio, and then integrate these multimodal contents into videos. Additionally, the animating units (e.g., Gen-2 and Sora) could make the videos more infectious. The AesopAgent system could orchestrate task workflow for video generation, ensuring that the generated video is both rich in content and coherent. This system mainly contains two layers, i.e., the Horizontal Layer and the Utility Layer. In the Horizontal Layer, we introduce a novel RAG-based evolutionary system that optimizes the whole video generation workflow and the steps within the workflow. It continuously evolves and iteratively optimizes workflow by accumulating expert experience and professional knowledge, including optimizing the LLM prompts and utilities usage. The Utility Layer provides multiple utilities, leading to consistent image generation that is visually coherent in terms of composition, characters, and style. Meanwhile, it provides audio and special effects, integrating them into expressive and logically arranged videos. Overall, our AesopAgent achieves state-of-the-art performance compared with many previous works in visual storytelling. Our AesopAgent is designed for convenient service for individual users, which is available on the following page: https://aesopai.github.io/.
doi_str_mv	10.48550/arxiv.2403.07952
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2403_07952</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2403_07952</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-a959d953e4aa09961081a254599244a02ff2fa31d2e54d1a406e8f24fc9312843</originalsourceid><addsrcrecordid>eNotj8FKAzEURbNxIdUPcGV-IGPy8jKduBtK1UJBocXt8GgSCbSTkkkH5--1o6uzuVzOYexByQobY-QT5e84VoBSV3JpDdyyTeuHdG6_fF-e-Qzhchx9z9djOl5KTD3lie-mofgTTz3flZQnUZL4jM4n_pGTuxyuszt2E-g4-Pt_Ltj-Zb1fvYnt--tm1W4F1UsQZI111miPRNLaWslGERg01gIiSQgBAmnlwBt0ilDWvgmA4WC1ggb1gj3-3c4t3TnH069gd23q5ib9A9arRlQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production</title><source>arXiv.org</source><creator>Wang, Jiuniu ; Du, Zehua ; Zhao, Yuyuan ; Yuan, Bo ; Wang, Kexiang ; Liang, Jian ; Zhao, Yaxi ; Lu, Yihen ; Li, Gengliang ; Gao, Junlong ; Tu, Xin ; Guo, Zhenyu</creator><creatorcontrib>Wang, Jiuniu ; Du, Zehua ; Zhao, Yuyuan ; Yuan, Bo ; Wang, Kexiang ; Liang, Jian ; Zhao, Yaxi ; Lu, Yihen ; Li, Gengliang ; Gao, Junlong ; Tu, Xin ; Guo, Zhenyu</creatorcontrib><description>The Agent and AIGC (Artificial Intelligence Generated Content) technologies have recently made significant progress. We propose AesopAgent, an Agent-driven Evolutionary System on Story-to-Video Production. AesopAgent is a practical application of agent technology for multimodal content generation. The system integrates multiple generative capabilities within a unified framework, so that individual users can leverage these modules easily. This innovative system would convert user story proposals into scripts, images, and audio, and then integrate these multimodal contents into videos. Additionally, the animating units (e.g., Gen-2 and Sora) could make the videos more infectious. The AesopAgent system could orchestrate task workflow for video generation, ensuring that the generated video is both rich in content and coherent. This system mainly contains two layers, i.e., the Horizontal Layer and the Utility Layer. In the Horizontal Layer, we introduce a novel RAG-based evolutionary system that optimizes the whole video generation workflow and the steps within the workflow. It continuously evolves and iteratively optimizes workflow by accumulating expert experience and professional knowledge, including optimizing the LLM prompts and utilities usage. The Utility Layer provides multiple utilities, leading to consistent image generation that is visually coherent in terms of composition, characters, and style. Meanwhile, it provides audio and special effects, integrating them into expressive and logically arranged videos. Overall, our AesopAgent achieves state-of-the-art performance compared with many previous works in visual storytelling. Our AesopAgent is designed for convenient service for individual users, which is available on the following page: https://aesopai.github.io/.</description><identifier>DOI: 10.48550/arxiv.2403.07952</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Multimedia</subject><creationdate>2024-03</creationdate><rights>http://creativecommons.org/publicdomain/zero/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,886</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2403.07952$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2403.07952$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wang, Jiuniu</creatorcontrib><creatorcontrib>Du, Zehua</creatorcontrib><creatorcontrib>Zhao, Yuyuan</creatorcontrib><creatorcontrib>Yuan, Bo</creatorcontrib><creatorcontrib>Wang, Kexiang</creatorcontrib><creatorcontrib>Liang, Jian</creatorcontrib><creatorcontrib>Zhao, Yaxi</creatorcontrib><creatorcontrib>Lu, Yihen</creatorcontrib><creatorcontrib>Li, Gengliang</creatorcontrib><creatorcontrib>Gao, Junlong</creatorcontrib><creatorcontrib>Tu, Xin</creatorcontrib><creatorcontrib>Guo, Zhenyu</creatorcontrib><title>AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production</title><description>The Agent and AIGC (Artificial Intelligence Generated Content) technologies have recently made significant progress. We propose AesopAgent, an Agent-driven Evolutionary System on Story-to-Video Production. AesopAgent is a practical application of agent technology for multimodal content generation. The system integrates multiple generative capabilities within a unified framework, so that individual users can leverage these modules easily. This innovative system would convert user story proposals into scripts, images, and audio, and then integrate these multimodal contents into videos. Additionally, the animating units (e.g., Gen-2 and Sora) could make the videos more infectious. The AesopAgent system could orchestrate task workflow for video generation, ensuring that the generated video is both rich in content and coherent. This system mainly contains two layers, i.e., the Horizontal Layer and the Utility Layer. In the Horizontal Layer, we introduce a novel RAG-based evolutionary system that optimizes the whole video generation workflow and the steps within the workflow. It continuously evolves and iteratively optimizes workflow by accumulating expert experience and professional knowledge, including optimizing the LLM prompts and utilities usage. The Utility Layer provides multiple utilities, leading to consistent image generation that is visually coherent in terms of composition, characters, and style. Meanwhile, it provides audio and special effects, integrating them into expressive and logically arranged videos. Overall, our AesopAgent achieves state-of-the-art performance compared with many previous works in visual storytelling. Our AesopAgent is designed for convenient service for individual users, which is available on the following page: https://aesopai.github.io/.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Multimedia</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8FKAzEURbNxIdUPcGV-IGPy8jKduBtK1UJBocXt8GgSCbSTkkkH5--1o6uzuVzOYexByQobY-QT5e84VoBSV3JpDdyyTeuHdG6_fF-e-Qzhchx9z9djOl5KTD3lie-mofgTTz3flZQnUZL4jM4n_pGTuxyuszt2E-g4-Pt_Ltj-Zb1fvYnt--tm1W4F1UsQZI111miPRNLaWslGERg01gIiSQgBAmnlwBt0ilDWvgmA4WC1ggb1gj3-3c4t3TnH069gd23q5ib9A9arRlQ</recordid><startdate>20240311</startdate><enddate>20240311</enddate><creator>Wang, Jiuniu</creator><creator>Du, Zehua</creator><creator>Zhao, Yuyuan</creator><creator>Yuan, Bo</creator><creator>Wang, Kexiang</creator><creator>Liang, Jian</creator><creator>Zhao, Yaxi</creator><creator>Lu, Yihen</creator><creator>Li, Gengliang</creator><creator>Gao, Junlong</creator><creator>Tu, Xin</creator><creator>Guo, Zhenyu</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240311</creationdate><title>AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production</title><author>Wang, Jiuniu ; Du, Zehua ; Zhao, Yuyuan ; Yuan, Bo ; Wang, Kexiang ; Liang, Jian ; Zhao, Yaxi ; Lu, Yihen ; Li, Gengliang ; Gao, Junlong ; Tu, Xin ; Guo, Zhenyu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-a959d953e4aa09961081a254599244a02ff2fa31d2e54d1a406e8f24fc9312843</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Multimedia</topic><toplevel>online_resources</toplevel><creatorcontrib>Wang, Jiuniu</creatorcontrib><creatorcontrib>Du, Zehua</creatorcontrib><creatorcontrib>Zhao, Yuyuan</creatorcontrib><creatorcontrib>Yuan, Bo</creatorcontrib><creatorcontrib>Wang, Kexiang</creatorcontrib><creatorcontrib>Liang, Jian</creatorcontrib><creatorcontrib>Zhao, Yaxi</creatorcontrib><creatorcontrib>Lu, Yihen</creatorcontrib><creatorcontrib>Li, Gengliang</creatorcontrib><creatorcontrib>Gao, Junlong</creatorcontrib><creatorcontrib>Tu, Xin</creatorcontrib><creatorcontrib>Guo, Zhenyu</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wang, Jiuniu</au><au>Du, Zehua</au><au>Zhao, Yuyuan</au><au>Yuan, Bo</au><au>Wang, Kexiang</au><au>Liang, Jian</au><au>Zhao, Yaxi</au><au>Lu, Yihen</au><au>Li, Gengliang</au><au>Gao, Junlong</au><au>Tu, Xin</au><au>Guo, Zhenyu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production</atitle><date>2024-03-11</date><risdate>2024</risdate><abstract>The Agent and AIGC (Artificial Intelligence Generated Content) technologies have recently made significant progress. We propose AesopAgent, an Agent-driven Evolutionary System on Story-to-Video Production. AesopAgent is a practical application of agent technology for multimodal content generation. The system integrates multiple generative capabilities within a unified framework, so that individual users can leverage these modules easily. This innovative system would convert user story proposals into scripts, images, and audio, and then integrate these multimodal contents into videos. Additionally, the animating units (e.g., Gen-2 and Sora) could make the videos more infectious. The AesopAgent system could orchestrate task workflow for video generation, ensuring that the generated video is both rich in content and coherent. This system mainly contains two layers, i.e., the Horizontal Layer and the Utility Layer. In the Horizontal Layer, we introduce a novel RAG-based evolutionary system that optimizes the whole video generation workflow and the steps within the workflow. It continuously evolves and iteratively optimizes workflow by accumulating expert experience and professional knowledge, including optimizing the LLM prompts and utilities usage. The Utility Layer provides multiple utilities, leading to consistent image generation that is visually coherent in terms of composition, characters, and style. Meanwhile, it provides audio and special effects, integrating them into expressive and logically arranged videos. Overall, our AesopAgent achieves state-of-the-art performance compared with many previous works in visual storytelling. Our AesopAgent is designed for convenient service for individual users, which is available on the following page: https://aesopai.github.io/.</abstract><doi>10.48550/arxiv.2403.07952</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2403.07952
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2403_07952
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition Computer Science - Multimedia
title	AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-16T06%3A45%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=AesopAgent:%20Agent-driven%20Evolutionary%20System%20on%20Story-to-Video%20Production&rft.au=Wang,%20Jiuniu&rft.date=2024-03-11&rft_id=info:doi/10.48550/arxiv.2403.07952&rft_dat=%3Carxiv_GOX%3E2403_07952%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true