Generative Disco: Text-to-Video Generation for Music Visualization

Visuals can enhance our experience of music, owing to the way they can amplify the emotions and messages conveyed within it. However, creating music visualization is a complex, time-consuming, and resource-intensive process. We introduce Generative Disco, a generative AI system that helps generate m...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Liu, Vivian, Long, Tao, Raw, Nathan, Chilton, Lydia
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Human-Computer Interaction
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Liu, Vivian Long, Tao Raw, Nathan Chilton, Lydia
description	Visuals can enhance our experience of music, owing to the way they can amplify the emotions and messages conveyed within it. However, creating music visualization is a complex, time-consuming, and resource-intensive process. We introduce Generative Disco, a generative AI system that helps generate music visualizations with large language models and text-to-video generation. The system helps users visualize music in intervals by finding prompts to describe the images that intervals start and end on and interpolating between them to the beat of the music. We introduce design patterns for improving these generated videos: transitions, which express shifts in color, time, subject, or style, and holds, which help focus the video on subjects. A study with professionals showed that transitions and holds were a highly expressive framework that enabled them to build coherent visual narratives. We conclude on the generalizability of these patterns and the potential of generated video for creative professionals.
doi_str_mv	10.48550/arxiv.2304.08551
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2304_08551</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2304_08551</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-cbe7bbbc5e543ae498185051be34514c5849803ccf5083358f43a37ef3376a1b3</originalsourceid><addsrcrecordid>eNo1j8FqAjEURbNxIdoPcGV-INOkL8-J3bW2asHiZnA7JPEFAtaUzCi2X99xrKsL514uHMYmShbaIMpHmy_xXDyB1IXsgBqy1xUdKds2nom_xcanZ17RpRVtEru4p8TvfTrykDL_PDXR811sTvYQf3s-ZoNgDw09_OeIVcv3arEWm-3qY_GyEXZWKuEdlc45j4QaLOm5UQYlKkegUWmPpkMSvA8oDQCa0M2gpABQzqxyMGLT220vUX_n-GXzT32VqXsZ-AOK-UP2</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Generative Disco: Text-to-Video Generation for Music Visualization</title><source>arXiv.org</source><creator>Liu, Vivian ; Long, Tao ; Raw, Nathan ; Chilton, Lydia</creator><creatorcontrib>Liu, Vivian ; Long, Tao ; Raw, Nathan ; Chilton, Lydia</creatorcontrib><description>Visuals can enhance our experience of music, owing to the way they can amplify the emotions and messages conveyed within it. However, creating music visualization is a complex, time-consuming, and resource-intensive process. We introduce Generative Disco, a generative AI system that helps generate music visualizations with large language models and text-to-video generation. The system helps users visualize music in intervals by finding prompts to describe the images that intervals start and end on and interpolating between them to the beat of the music. We introduce design patterns for improving these generated videos: transitions, which express shifts in color, time, subject, or style, and holds, which help focus the video on subjects. A study with professionals showed that transitions and holds were a highly expressive framework that enabled them to build coherent visual narratives. We conclude on the generalizability of these patterns and the potential of generated video for creative professionals.</description><identifier>DOI: 10.48550/arxiv.2304.08551</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Human-Computer Interaction</subject><creationdate>2023-04</creationdate><rights>http://creativecommons.org/licenses/by-nc-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2304.08551$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2304.08551$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Liu, Vivian</creatorcontrib><creatorcontrib>Long, Tao</creatorcontrib><creatorcontrib>Raw, Nathan</creatorcontrib><creatorcontrib>Chilton, Lydia</creatorcontrib><title>Generative Disco: Text-to-Video Generation for Music Visualization</title><description>Visuals can enhance our experience of music, owing to the way they can amplify the emotions and messages conveyed within it. However, creating music visualization is a complex, time-consuming, and resource-intensive process. We introduce Generative Disco, a generative AI system that helps generate music visualizations with large language models and text-to-video generation. The system helps users visualize music in intervals by finding prompts to describe the images that intervals start and end on and interpolating between them to the beat of the music. We introduce design patterns for improving these generated videos: transitions, which express shifts in color, time, subject, or style, and holds, which help focus the video on subjects. A study with professionals showed that transitions and holds were a highly expressive framework that enabled them to build coherent visual narratives. We conclude on the generalizability of these patterns and the potential of generated video for creative professionals.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Human-Computer Interaction</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNo1j8FqAjEURbNxIdoPcGV-INOkL8-J3bW2asHiZnA7JPEFAtaUzCi2X99xrKsL514uHMYmShbaIMpHmy_xXDyB1IXsgBqy1xUdKds2nom_xcanZ17RpRVtEru4p8TvfTrykDL_PDXR811sTvYQf3s-ZoNgDw09_OeIVcv3arEWm-3qY_GyEXZWKuEdlc45j4QaLOm5UQYlKkegUWmPpkMSvA8oDQCa0M2gpABQzqxyMGLT220vUX_n-GXzT32VqXsZ-AOK-UP2</recordid><startdate>20230417</startdate><enddate>20230417</enddate><creator>Liu, Vivian</creator><creator>Long, Tao</creator><creator>Raw, Nathan</creator><creator>Chilton, Lydia</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230417</creationdate><title>Generative Disco: Text-to-Video Generation for Music Visualization</title><author>Liu, Vivian ; Long, Tao ; Raw, Nathan ; Chilton, Lydia</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-cbe7bbbc5e543ae498185051be34514c5849803ccf5083358f43a37ef3376a1b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Human-Computer Interaction</topic><toplevel>online_resources</toplevel><creatorcontrib>Liu, Vivian</creatorcontrib><creatorcontrib>Long, Tao</creatorcontrib><creatorcontrib>Raw, Nathan</creatorcontrib><creatorcontrib>Chilton, Lydia</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liu, Vivian</au><au>Long, Tao</au><au>Raw, Nathan</au><au>Chilton, Lydia</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Generative Disco: Text-to-Video Generation for Music Visualization</atitle><date>2023-04-17</date><risdate>2023</risdate><abstract>Visuals can enhance our experience of music, owing to the way they can amplify the emotions and messages conveyed within it. However, creating music visualization is a complex, time-consuming, and resource-intensive process. We introduce Generative Disco, a generative AI system that helps generate music visualizations with large language models and text-to-video generation. The system helps users visualize music in intervals by finding prompts to describe the images that intervals start and end on and interpolating between them to the beat of the music. We introduce design patterns for improving these generated videos: transitions, which express shifts in color, time, subject, or style, and holds, which help focus the video on subjects. A study with professionals showed that transitions and holds were a highly expressive framework that enabled them to build coherent visual narratives. We conclude on the generalizability of these patterns and the potential of generated video for creative professionals.</abstract><doi>10.48550/arxiv.2304.08551</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2304.08551
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2304_08551
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Human-Computer Interaction
title	Generative Disco: Text-to-Video Generation for Music Visualization
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T03%3A22%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Generative%20Disco:%20Text-to-Video%20Generation%20for%20Music%20Visualization&rft.au=Liu,%20Vivian&rft.date=2023-04-17&rft_id=info:doi/10.48550/arxiv.2304.08551&rft_dat=%3Carxiv_GOX%3E2304_08551%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true