TEAdapter: Supply abundant guidance for controllable text-to-music generation

2024 IEEE International Conference on Multimedia and Expo (ICME 2024) Although current text-guided music generation technology can cope with simple creative scenarios, achieving fine-grained control over individual text-modality conditions remains challenging as user demands become more intricate. A...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zou, Jialing, Mei, Jiahao, Nan, Xudong, Li, Jinghua, Dong, Daoguo, He, Liang
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Multimedia Computer Science - Sound
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Zou, Jialing Mei, Jiahao Nan, Xudong Li, Jinghua Dong, Daoguo He, Liang
description	2024 IEEE International Conference on Multimedia and Expo (ICME 2024) Although current text-guided music generation technology can cope with simple creative scenarios, achieving fine-grained control over individual text-modality conditions remains challenging as user demands become more intricate. Accordingly, we introduce the TEAcher Adapter (TEAdapter), a compact plugin designed to guide the generation process with diverse control information provided by users. In addition, we explore the controllable generation of extended music by leveraging TEAdapter control groups trained on data of distinct structural functionalities. In general, we consider controls over global, elemental, and structural levels. Experimental results demonstrate that the proposed TEAdapter enables multiple precise controls and ensures high-quality music generation. Our module is also lightweight and transferable to any diffusion model architecture. Available code and demos will be found soon at https://github.com/Ashley1101/TEAdapter.
doi_str_mv	10.48550/arxiv.2408.04865
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2408_04865</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2408_04865</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2408_048653</originalsourceid><addsrcrecordid>eNqFjrEOgjAUALs4GPUDnOwPgFXBEDdjMC5OupMHPEiT0jaPVwN_rxJ3p1suuRNivVNxkqWp2gIN-hXvE5XFKsmO6Vzcn_m5Bs9IJ_kI3ptRQhlsDZZlG_SHFcrGkaycZXLGQGlQMg4csYu60OtKtmiRgLWzSzFrwPS4-nEhNtf8eblFU7jwpDugsfgOFNPA4b_xBmUBPBU</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>TEAdapter: Supply abundant guidance for controllable text-to-music generation</title><source>arXiv.org</source><creator>Zou, Jialing ; Mei, Jiahao ; Nan, Xudong ; Li, Jinghua ; Dong, Daoguo ; He, Liang</creator><creatorcontrib>Zou, Jialing ; Mei, Jiahao ; Nan, Xudong ; Li, Jinghua ; Dong, Daoguo ; He, Liang</creatorcontrib><description>2024 IEEE International Conference on Multimedia and Expo (ICME 2024) Although current text-guided music generation technology can cope with simple creative scenarios, achieving fine-grained control over individual text-modality conditions remains challenging as user demands become more intricate. Accordingly, we introduce the TEAcher Adapter (TEAdapter), a compact plugin designed to guide the generation process with diverse control information provided by users. In addition, we explore the controllable generation of extended music by leveraging TEAdapter control groups trained on data of distinct structural functionalities. In general, we consider controls over global, elemental, and structural levels. Experimental results demonstrate that the proposed TEAdapter enables multiple precise controls and ensures high-quality music generation. Our module is also lightweight and transferable to any diffusion model architecture. Available code and demos will be found soon at https://github.com/Ashley1101/TEAdapter.</description><identifier>DOI: 10.48550/arxiv.2408.04865</identifier><language>eng</language><subject>Computer Science - Multimedia ; Computer Science - Sound</subject><creationdate>2024-08</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2408.04865$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2408.04865$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zou, Jialing</creatorcontrib><creatorcontrib>Mei, Jiahao</creatorcontrib><creatorcontrib>Nan, Xudong</creatorcontrib><creatorcontrib>Li, Jinghua</creatorcontrib><creatorcontrib>Dong, Daoguo</creatorcontrib><creatorcontrib>He, Liang</creatorcontrib><title>TEAdapter: Supply abundant guidance for controllable text-to-music generation</title><description>2024 IEEE International Conference on Multimedia and Expo (ICME 2024) Although current text-guided music generation technology can cope with simple creative scenarios, achieving fine-grained control over individual text-modality conditions remains challenging as user demands become more intricate. Accordingly, we introduce the TEAcher Adapter (TEAdapter), a compact plugin designed to guide the generation process with diverse control information provided by users. In addition, we explore the controllable generation of extended music by leveraging TEAdapter control groups trained on data of distinct structural functionalities. In general, we consider controls over global, elemental, and structural levels. Experimental results demonstrate that the proposed TEAdapter enables multiple precise controls and ensures high-quality music generation. Our module is also lightweight and transferable to any diffusion model architecture. Available code and demos will be found soon at https://github.com/Ashley1101/TEAdapter.</description><subject>Computer Science - Multimedia</subject><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjrEOgjAUALs4GPUDnOwPgFXBEDdjMC5OupMHPEiT0jaPVwN_rxJ3p1suuRNivVNxkqWp2gIN-hXvE5XFKsmO6Vzcn_m5Bs9IJ_kI3ptRQhlsDZZlG_SHFcrGkaycZXLGQGlQMg4csYu60OtKtmiRgLWzSzFrwPS4-nEhNtf8eblFU7jwpDugsfgOFNPA4b_xBmUBPBU</recordid><startdate>20240809</startdate><enddate>20240809</enddate><creator>Zou, Jialing</creator><creator>Mei, Jiahao</creator><creator>Nan, Xudong</creator><creator>Li, Jinghua</creator><creator>Dong, Daoguo</creator><creator>He, Liang</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240809</creationdate><title>TEAdapter: Supply abundant guidance for controllable text-to-music generation</title><author>Zou, Jialing ; Mei, Jiahao ; Nan, Xudong ; Li, Jinghua ; Dong, Daoguo ; He, Liang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2408_048653</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Multimedia</topic><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Zou, Jialing</creatorcontrib><creatorcontrib>Mei, Jiahao</creatorcontrib><creatorcontrib>Nan, Xudong</creatorcontrib><creatorcontrib>Li, Jinghua</creatorcontrib><creatorcontrib>Dong, Daoguo</creatorcontrib><creatorcontrib>He, Liang</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zou, Jialing</au><au>Mei, Jiahao</au><au>Nan, Xudong</au><au>Li, Jinghua</au><au>Dong, Daoguo</au><au>He, Liang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>TEAdapter: Supply abundant guidance for controllable text-to-music generation</atitle><date>2024-08-09</date><risdate>2024</risdate><abstract>2024 IEEE International Conference on Multimedia and Expo (ICME 2024) Although current text-guided music generation technology can cope with simple creative scenarios, achieving fine-grained control over individual text-modality conditions remains challenging as user demands become more intricate. Accordingly, we introduce the TEAcher Adapter (TEAdapter), a compact plugin designed to guide the generation process with diverse control information provided by users. In addition, we explore the controllable generation of extended music by leveraging TEAdapter control groups trained on data of distinct structural functionalities. In general, we consider controls over global, elemental, and structural levels. Experimental results demonstrate that the proposed TEAdapter enables multiple precise controls and ensures high-quality music generation. Our module is also lightweight and transferable to any diffusion model architecture. Available code and demos will be found soon at https://github.com/Ashley1101/TEAdapter.</abstract><doi>10.48550/arxiv.2408.04865</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2408.04865
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2408_04865
source	arXiv.org
subjects	Computer Science - Multimedia Computer Science - Sound
title	TEAdapter: Supply abundant guidance for controllable text-to-music generation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T18%3A13%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=TEAdapter:%20Supply%20abundant%20guidance%20for%20controllable%20text-to-music%20generation&rft.au=Zou,%20Jialing&rft.date=2024-08-09&rft_id=info:doi/10.48550/arxiv.2408.04865&rft_dat=%3Carxiv_GOX%3E2408_04865%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true