What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance
The emergence of text-to-image synthesis (TIS) models has significantly influenced digital image creation by producing high-quality visuals from written descriptions. Yet these models heavily rely on the quality and specificity of textual prompts, posing a challenge for novice users who may not be f...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2024-08 |
---|---|
Hauptverfasser: | , , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Liu, Yilun He, Minggui Yao, Feiyu Ji, Yuhe Tao, Shimin Du, Jingzhou Li, Duan Gao, Jian Zhang, Li Yang, Hao Chen, Boxing Yoshie, Osamu |
description | The emergence of text-to-image synthesis (TIS) models has significantly influenced digital image creation by producing high-quality visuals from written descriptions. Yet these models heavily rely on the quality and specificity of textual prompts, posing a challenge for novice users who may not be familiar with TIS-model-preferred prompt writing. Existing solutions relieve this via automatic model-preferred prompt generation from user queries. However, this single-turn manner suffers from limited user-centricity in terms of result interpretability and user interactivity. To address these issues, we propose DialPrompt, a multi-turn dialogue-based TIS prompt generation model that emphasises user-centricity. DialPrompt is designed to follow a multi-turn guidance workflow, where in each round of dialogue the model queries user with their preferences on possible optimization dimensions before generating the final TIS prompt. To achieve this, we mined 15 essential dimensions for high-quality prompts from advanced users and curated a multi-turn dataset. Through training on this dataset, DialPrompt can improve interpretability by allowing users to understand the correlation between specific phrases and image attributes. Additionally, it enables greater user control and engagement in the prompt generation process, leading to more personalized and visually satisfying outputs. Experiments indicate that DialPrompt achieves a competitive result in the quality of synthesized images, outperforming existing prompt engineering approaches by 5.7%. Furthermore, in our user evaluation, DialPrompt outperforms existing approaches by 46.5% in user-centricity score and is rated 7.9/10 by 19 human reviewers. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3097278530</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3097278530</sourcerecordid><originalsourceid>FETCH-proquest_journals_30972785303</originalsourceid><addsrcrecordid>eNqNyr0KwjAUQOEgCBb1HS44B2JirU4O_i6CoCJOEtpbTWkTTW5E314HH8DpDOdrsUQqNeSTkZQd1g-hEkLIcSbTVCWsON00wcLB2UU4aUszOAb0PEdL3uSw8665E6zRotdknIXSeTjgizg5bhp9Rdi_Ld0wmABPo2EbazKcorewjqbQNscea5e6Dtj_tcsGq-VhvuF37x4RA10q9_XfdVFimslskiqh_lMfoIpFtw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3097278530</pqid></control><display><type>article</type><title>What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance</title><source>Freely Accessible Journals</source><creator>Liu, Yilun ; He, Minggui ; Yao, Feiyu ; Ji, Yuhe ; Tao, Shimin ; Du, Jingzhou ; Li, Duan ; Gao, Jian ; Zhang, Li ; Yang, Hao ; Chen, Boxing ; Yoshie, Osamu</creator><creatorcontrib>Liu, Yilun ; He, Minggui ; Yao, Feiyu ; Ji, Yuhe ; Tao, Shimin ; Du, Jingzhou ; Li, Duan ; Gao, Jian ; Zhang, Li ; Yang, Hao ; Chen, Boxing ; Yoshie, Osamu</creatorcontrib><description>The emergence of text-to-image synthesis (TIS) models has significantly influenced digital image creation by producing high-quality visuals from written descriptions. Yet these models heavily rely on the quality and specificity of textual prompts, posing a challenge for novice users who may not be familiar with TIS-model-preferred prompt writing. Existing solutions relieve this via automatic model-preferred prompt generation from user queries. However, this single-turn manner suffers from limited user-centricity in terms of result interpretability and user interactivity. To address these issues, we propose DialPrompt, a multi-turn dialogue-based TIS prompt generation model that emphasises user-centricity. DialPrompt is designed to follow a multi-turn guidance workflow, where in each round of dialogue the model queries user with their preferences on possible optimization dimensions before generating the final TIS prompt. To achieve this, we mined 15 essential dimensions for high-quality prompts from advanced users and curated a multi-turn dataset. Through training on this dataset, DialPrompt can improve interpretability by allowing users to understand the correlation between specific phrases and image attributes. Additionally, it enables greater user control and engagement in the prompt generation process, leading to more personalized and visually satisfying outputs. Experiments indicate that DialPrompt achieves a competitive result in the quality of synthesized images, outperforming existing prompt engineering approaches by 5.7%. Furthermore, in our user evaluation, DialPrompt outperforms existing approaches by 46.5% in user-centricity score and is rated 7.9/10 by 19 human reviewers.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Datasets ; Digital imaging ; Image quality ; Prompt engineering ; Queries ; Synthesis ; User satisfaction ; Workflow</subject><ispartof>arXiv.org, 2024-08</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>778,782</link.rule.ids></links><search><creatorcontrib>Liu, Yilun</creatorcontrib><creatorcontrib>He, Minggui</creatorcontrib><creatorcontrib>Yao, Feiyu</creatorcontrib><creatorcontrib>Ji, Yuhe</creatorcontrib><creatorcontrib>Tao, Shimin</creatorcontrib><creatorcontrib>Du, Jingzhou</creatorcontrib><creatorcontrib>Li, Duan</creatorcontrib><creatorcontrib>Gao, Jian</creatorcontrib><creatorcontrib>Zhang, Li</creatorcontrib><creatorcontrib>Yang, Hao</creatorcontrib><creatorcontrib>Chen, Boxing</creatorcontrib><creatorcontrib>Yoshie, Osamu</creatorcontrib><title>What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance</title><title>arXiv.org</title><description>The emergence of text-to-image synthesis (TIS) models has significantly influenced digital image creation by producing high-quality visuals from written descriptions. Yet these models heavily rely on the quality and specificity of textual prompts, posing a challenge for novice users who may not be familiar with TIS-model-preferred prompt writing. Existing solutions relieve this via automatic model-preferred prompt generation from user queries. However, this single-turn manner suffers from limited user-centricity in terms of result interpretability and user interactivity. To address these issues, we propose DialPrompt, a multi-turn dialogue-based TIS prompt generation model that emphasises user-centricity. DialPrompt is designed to follow a multi-turn guidance workflow, where in each round of dialogue the model queries user with their preferences on possible optimization dimensions before generating the final TIS prompt. To achieve this, we mined 15 essential dimensions for high-quality prompts from advanced users and curated a multi-turn dataset. Through training on this dataset, DialPrompt can improve interpretability by allowing users to understand the correlation between specific phrases and image attributes. Additionally, it enables greater user control and engagement in the prompt generation process, leading to more personalized and visually satisfying outputs. Experiments indicate that DialPrompt achieves a competitive result in the quality of synthesized images, outperforming existing prompt engineering approaches by 5.7%. Furthermore, in our user evaluation, DialPrompt outperforms existing approaches by 46.5% in user-centricity score and is rated 7.9/10 by 19 human reviewers.</description><subject>Datasets</subject><subject>Digital imaging</subject><subject>Image quality</subject><subject>Prompt engineering</subject><subject>Queries</subject><subject>Synthesis</subject><subject>User satisfaction</subject><subject>Workflow</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNyr0KwjAUQOEgCBb1HS44B2JirU4O_i6CoCJOEtpbTWkTTW5E314HH8DpDOdrsUQqNeSTkZQd1g-hEkLIcSbTVCWsON00wcLB2UU4aUszOAb0PEdL3uSw8665E6zRotdknIXSeTjgizg5bhp9Rdi_Ld0wmABPo2EbazKcorewjqbQNscea5e6Dtj_tcsGq-VhvuF37x4RA10q9_XfdVFimslskiqh_lMfoIpFtw</recordid><startdate>20240823</startdate><enddate>20240823</enddate><creator>Liu, Yilun</creator><creator>He, Minggui</creator><creator>Yao, Feiyu</creator><creator>Ji, Yuhe</creator><creator>Tao, Shimin</creator><creator>Du, Jingzhou</creator><creator>Li, Duan</creator><creator>Gao, Jian</creator><creator>Zhang, Li</creator><creator>Yang, Hao</creator><creator>Chen, Boxing</creator><creator>Yoshie, Osamu</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240823</creationdate><title>What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance</title><author>Liu, Yilun ; He, Minggui ; Yao, Feiyu ; Ji, Yuhe ; Tao, Shimin ; Du, Jingzhou ; Li, Duan ; Gao, Jian ; Zhang, Li ; Yang, Hao ; Chen, Boxing ; Yoshie, Osamu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30972785303</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Datasets</topic><topic>Digital imaging</topic><topic>Image quality</topic><topic>Prompt engineering</topic><topic>Queries</topic><topic>Synthesis</topic><topic>User satisfaction</topic><topic>Workflow</topic><toplevel>online_resources</toplevel><creatorcontrib>Liu, Yilun</creatorcontrib><creatorcontrib>He, Minggui</creatorcontrib><creatorcontrib>Yao, Feiyu</creatorcontrib><creatorcontrib>Ji, Yuhe</creatorcontrib><creatorcontrib>Tao, Shimin</creatorcontrib><creatorcontrib>Du, Jingzhou</creatorcontrib><creatorcontrib>Li, Duan</creatorcontrib><creatorcontrib>Gao, Jian</creatorcontrib><creatorcontrib>Zhang, Li</creatorcontrib><creatorcontrib>Yang, Hao</creatorcontrib><creatorcontrib>Chen, Boxing</creatorcontrib><creatorcontrib>Yoshie, Osamu</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, Yilun</au><au>He, Minggui</au><au>Yao, Feiyu</au><au>Ji, Yuhe</au><au>Tao, Shimin</au><au>Du, Jingzhou</au><au>Li, Duan</au><au>Gao, Jian</au><au>Zhang, Li</au><au>Yang, Hao</au><au>Chen, Boxing</au><au>Yoshie, Osamu</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance</atitle><jtitle>arXiv.org</jtitle><date>2024-08-23</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>The emergence of text-to-image synthesis (TIS) models has significantly influenced digital image creation by producing high-quality visuals from written descriptions. Yet these models heavily rely on the quality and specificity of textual prompts, posing a challenge for novice users who may not be familiar with TIS-model-preferred prompt writing. Existing solutions relieve this via automatic model-preferred prompt generation from user queries. However, this single-turn manner suffers from limited user-centricity in terms of result interpretability and user interactivity. To address these issues, we propose DialPrompt, a multi-turn dialogue-based TIS prompt generation model that emphasises user-centricity. DialPrompt is designed to follow a multi-turn guidance workflow, where in each round of dialogue the model queries user with their preferences on possible optimization dimensions before generating the final TIS prompt. To achieve this, we mined 15 essential dimensions for high-quality prompts from advanced users and curated a multi-turn dataset. Through training on this dataset, DialPrompt can improve interpretability by allowing users to understand the correlation between specific phrases and image attributes. Additionally, it enables greater user control and engagement in the prompt generation process, leading to more personalized and visually satisfying outputs. Experiments indicate that DialPrompt achieves a competitive result in the quality of synthesized images, outperforming existing prompt engineering approaches by 5.7%. Furthermore, in our user evaluation, DialPrompt outperforms existing approaches by 46.5% in user-centricity score and is rated 7.9/10 by 19 human reviewers.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2024-08 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_3097278530 |
source | Freely Accessible Journals |
subjects | Datasets Digital imaging Image quality Prompt engineering Queries Synthesis User satisfaction Workflow |
title | What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T16%3A48%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=What%20Do%20You%20Want?%20User-centric%20Prompt%20Generation%20for%20Text-to-image%20Synthesis%20via%20Multi-turn%20Guidance&rft.jtitle=arXiv.org&rft.au=Liu,%20Yilun&rft.date=2024-08-23&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3097278530%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3097278530&rft_id=info:pmid/&rfr_iscdi=true |