AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production
The Agent and AIGC (Artificial Intelligence Generated Content) technologies have recently made significant progress. We propose AesopAgent, an Agent-driven Evolutionary System on Story-to-Video Production. AesopAgent is a practical application of agent technology for multimodal content generation. T...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Wang, Jiuniu Du, Zehua Zhao, Yuyuan Yuan, Bo Wang, Kexiang Liang, Jian Zhao, Yaxi Lu, Yihen Li, Gengliang Gao, Junlong Tu, Xin Guo, Zhenyu |
description | The Agent and AIGC (Artificial Intelligence Generated Content) technologies
have recently made significant progress. We propose AesopAgent, an Agent-driven
Evolutionary System on Story-to-Video Production. AesopAgent is a practical
application of agent technology for multimodal content generation. The system
integrates multiple generative capabilities within a unified framework, so that
individual users can leverage these modules easily. This innovative system
would convert user story proposals into scripts, images, and audio, and then
integrate these multimodal contents into videos. Additionally, the animating
units (e.g., Gen-2 and Sora) could make the videos more infectious. The
AesopAgent system could orchestrate task workflow for video generation,
ensuring that the generated video is both rich in content and coherent. This
system mainly contains two layers, i.e., the Horizontal Layer and the Utility
Layer. In the Horizontal Layer, we introduce a novel RAG-based evolutionary
system that optimizes the whole video generation workflow and the steps within
the workflow. It continuously evolves and iteratively optimizes workflow by
accumulating expert experience and professional knowledge, including optimizing
the LLM prompts and utilities usage. The Utility Layer provides multiple
utilities, leading to consistent image generation that is visually coherent in
terms of composition, characters, and style. Meanwhile, it provides audio and
special effects, integrating them into expressive and logically arranged
videos. Overall, our AesopAgent achieves state-of-the-art performance compared
with many previous works in visual storytelling. Our AesopAgent is designed for
convenient service for individual users, which is available on the following
page: https://aesopai.github.io/. |
doi_str_mv | 10.48550/arxiv.2403.07952 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2403_07952</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2403_07952</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-a959d953e4aa09961081a254599244a02ff2fa31d2e54d1a406e8f24fc9312843</originalsourceid><addsrcrecordid>eNotj8FKAzEURbNxIdUPcGV-IGPy8jKduBtK1UJBocXt8GgSCbSTkkkH5--1o6uzuVzOYexByQobY-QT5e84VoBSV3JpDdyyTeuHdG6_fF-e-Qzhchx9z9djOl5KTD3lie-mofgTTz3flZQnUZL4jM4n_pGTuxyuszt2E-g4-Pt_Ltj-Zb1fvYnt--tm1W4F1UsQZI111miPRNLaWslGERg01gIiSQgBAmnlwBt0ilDWvgmA4WC1ggb1gj3-3c4t3TnH069gd23q5ib9A9arRlQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production</title><source>arXiv.org</source><creator>Wang, Jiuniu ; Du, Zehua ; Zhao, Yuyuan ; Yuan, Bo ; Wang, Kexiang ; Liang, Jian ; Zhao, Yaxi ; Lu, Yihen ; Li, Gengliang ; Gao, Junlong ; Tu, Xin ; Guo, Zhenyu</creator><creatorcontrib>Wang, Jiuniu ; Du, Zehua ; Zhao, Yuyuan ; Yuan, Bo ; Wang, Kexiang ; Liang, Jian ; Zhao, Yaxi ; Lu, Yihen ; Li, Gengliang ; Gao, Junlong ; Tu, Xin ; Guo, Zhenyu</creatorcontrib><description>The Agent and AIGC (Artificial Intelligence Generated Content) technologies
have recently made significant progress. We propose AesopAgent, an Agent-driven
Evolutionary System on Story-to-Video Production. AesopAgent is a practical
application of agent technology for multimodal content generation. The system
integrates multiple generative capabilities within a unified framework, so that
individual users can leverage these modules easily. This innovative system
would convert user story proposals into scripts, images, and audio, and then
integrate these multimodal contents into videos. Additionally, the animating
units (e.g., Gen-2 and Sora) could make the videos more infectious. The
AesopAgent system could orchestrate task workflow for video generation,
ensuring that the generated video is both rich in content and coherent. This
system mainly contains two layers, i.e., the Horizontal Layer and the Utility
Layer. In the Horizontal Layer, we introduce a novel RAG-based evolutionary
system that optimizes the whole video generation workflow and the steps within
the workflow. It continuously evolves and iteratively optimizes workflow by
accumulating expert experience and professional knowledge, including optimizing
the LLM prompts and utilities usage. The Utility Layer provides multiple
utilities, leading to consistent image generation that is visually coherent in
terms of composition, characters, and style. Meanwhile, it provides audio and
special effects, integrating them into expressive and logically arranged
videos. Overall, our AesopAgent achieves state-of-the-art performance compared
with many previous works in visual storytelling. Our AesopAgent is designed for
convenient service for individual users, which is available on the following
page: https://aesopai.github.io/.</description><identifier>DOI: 10.48550/arxiv.2403.07952</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Multimedia</subject><creationdate>2024-03</creationdate><rights>http://creativecommons.org/publicdomain/zero/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,886</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2403.07952$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2403.07952$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wang, Jiuniu</creatorcontrib><creatorcontrib>Du, Zehua</creatorcontrib><creatorcontrib>Zhao, Yuyuan</creatorcontrib><creatorcontrib>Yuan, Bo</creatorcontrib><creatorcontrib>Wang, Kexiang</creatorcontrib><creatorcontrib>Liang, Jian</creatorcontrib><creatorcontrib>Zhao, Yaxi</creatorcontrib><creatorcontrib>Lu, Yihen</creatorcontrib><creatorcontrib>Li, Gengliang</creatorcontrib><creatorcontrib>Gao, Junlong</creatorcontrib><creatorcontrib>Tu, Xin</creatorcontrib><creatorcontrib>Guo, Zhenyu</creatorcontrib><title>AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production</title><description>The Agent and AIGC (Artificial Intelligence Generated Content) technologies
have recently made significant progress. We propose AesopAgent, an Agent-driven
Evolutionary System on Story-to-Video Production. AesopAgent is a practical
application of agent technology for multimodal content generation. The system
integrates multiple generative capabilities within a unified framework, so that
individual users can leverage these modules easily. This innovative system
would convert user story proposals into scripts, images, and audio, and then
integrate these multimodal contents into videos. Additionally, the animating
units (e.g., Gen-2 and Sora) could make the videos more infectious. The
AesopAgent system could orchestrate task workflow for video generation,
ensuring that the generated video is both rich in content and coherent. This
system mainly contains two layers, i.e., the Horizontal Layer and the Utility
Layer. In the Horizontal Layer, we introduce a novel RAG-based evolutionary
system that optimizes the whole video generation workflow and the steps within
the workflow. It continuously evolves and iteratively optimizes workflow by
accumulating expert experience and professional knowledge, including optimizing
the LLM prompts and utilities usage. The Utility Layer provides multiple
utilities, leading to consistent image generation that is visually coherent in
terms of composition, characters, and style. Meanwhile, it provides audio and
special effects, integrating them into expressive and logically arranged
videos. Overall, our AesopAgent achieves state-of-the-art performance compared
with many previous works in visual storytelling. Our AesopAgent is designed for
convenient service for individual users, which is available on the following
page: https://aesopai.github.io/.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Multimedia</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8FKAzEURbNxIdUPcGV-IGPy8jKduBtK1UJBocXt8GgSCbSTkkkH5--1o6uzuVzOYexByQobY-QT5e84VoBSV3JpDdyyTeuHdG6_fF-e-Qzhchx9z9djOl5KTD3lie-mofgTTz3flZQnUZL4jM4n_pGTuxyuszt2E-g4-Pt_Ltj-Zb1fvYnt--tm1W4F1UsQZI111miPRNLaWslGERg01gIiSQgBAmnlwBt0ilDWvgmA4WC1ggb1gj3-3c4t3TnH069gd23q5ib9A9arRlQ</recordid><startdate>20240311</startdate><enddate>20240311</enddate><creator>Wang, Jiuniu</creator><creator>Du, Zehua</creator><creator>Zhao, Yuyuan</creator><creator>Yuan, Bo</creator><creator>Wang, Kexiang</creator><creator>Liang, Jian</creator><creator>Zhao, Yaxi</creator><creator>Lu, Yihen</creator><creator>Li, Gengliang</creator><creator>Gao, Junlong</creator><creator>Tu, Xin</creator><creator>Guo, Zhenyu</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240311</creationdate><title>AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production</title><author>Wang, Jiuniu ; Du, Zehua ; Zhao, Yuyuan ; Yuan, Bo ; Wang, Kexiang ; Liang, Jian ; Zhao, Yaxi ; Lu, Yihen ; Li, Gengliang ; Gao, Junlong ; Tu, Xin ; Guo, Zhenyu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-a959d953e4aa09961081a254599244a02ff2fa31d2e54d1a406e8f24fc9312843</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Multimedia</topic><toplevel>online_resources</toplevel><creatorcontrib>Wang, Jiuniu</creatorcontrib><creatorcontrib>Du, Zehua</creatorcontrib><creatorcontrib>Zhao, Yuyuan</creatorcontrib><creatorcontrib>Yuan, Bo</creatorcontrib><creatorcontrib>Wang, Kexiang</creatorcontrib><creatorcontrib>Liang, Jian</creatorcontrib><creatorcontrib>Zhao, Yaxi</creatorcontrib><creatorcontrib>Lu, Yihen</creatorcontrib><creatorcontrib>Li, Gengliang</creatorcontrib><creatorcontrib>Gao, Junlong</creatorcontrib><creatorcontrib>Tu, Xin</creatorcontrib><creatorcontrib>Guo, Zhenyu</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wang, Jiuniu</au><au>Du, Zehua</au><au>Zhao, Yuyuan</au><au>Yuan, Bo</au><au>Wang, Kexiang</au><au>Liang, Jian</au><au>Zhao, Yaxi</au><au>Lu, Yihen</au><au>Li, Gengliang</au><au>Gao, Junlong</au><au>Tu, Xin</au><au>Guo, Zhenyu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production</atitle><date>2024-03-11</date><risdate>2024</risdate><abstract>The Agent and AIGC (Artificial Intelligence Generated Content) technologies
have recently made significant progress. We propose AesopAgent, an Agent-driven
Evolutionary System on Story-to-Video Production. AesopAgent is a practical
application of agent technology for multimodal content generation. The system
integrates multiple generative capabilities within a unified framework, so that
individual users can leverage these modules easily. This innovative system
would convert user story proposals into scripts, images, and audio, and then
integrate these multimodal contents into videos. Additionally, the animating
units (e.g., Gen-2 and Sora) could make the videos more infectious. The
AesopAgent system could orchestrate task workflow for video generation,
ensuring that the generated video is both rich in content and coherent. This
system mainly contains two layers, i.e., the Horizontal Layer and the Utility
Layer. In the Horizontal Layer, we introduce a novel RAG-based evolutionary
system that optimizes the whole video generation workflow and the steps within
the workflow. It continuously evolves and iteratively optimizes workflow by
accumulating expert experience and professional knowledge, including optimizing
the LLM prompts and utilities usage. The Utility Layer provides multiple
utilities, leading to consistent image generation that is visually coherent in
terms of composition, characters, and style. Meanwhile, it provides audio and
special effects, integrating them into expressive and logically arranged
videos. Overall, our AesopAgent achieves state-of-the-art performance compared
with many previous works in visual storytelling. Our AesopAgent is designed for
convenient service for individual users, which is available on the following
page: https://aesopai.github.io/.</abstract><doi>10.48550/arxiv.2403.07952</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2403.07952 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2403_07952 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition Computer Science - Multimedia |
title | AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-16T06%3A45%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=AesopAgent:%20Agent-driven%20Evolutionary%20System%20on%20Story-to-Video%20Production&rft.au=Wang,%20Jiuniu&rft.date=2024-03-11&rft_id=info:doi/10.48550/arxiv.2403.07952&rft_dat=%3Carxiv_GOX%3E2403_07952%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |