Personalized and Sequential Text-to-Image Generation
We address the problem of personalized, interactive text-to-image (T2I) generation, designing a reinforcement learning (RL) agent which iteratively improves a set of generated images for a user through a sequence of prompt expansions. Using human raters, we create a novel dataset of sequential prefe...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We address the problem of personalized, interactive text-to-image (T2I)
generation, designing a reinforcement learning (RL) agent which iteratively
improves a set of generated images for a user through a sequence of prompt
expansions. Using human raters, we create a novel dataset of sequential
preferences, which we leverage, together with large-scale open-source
(non-sequential) datasets. We construct user-preference and user-choice models
using an EM strategy and identify varying user preference types. We then
leverage a large multimodal language model (LMM) and a value-based RL approach
to suggest a personalized and diverse slate of prompt expansions to the user.
Our Personalized And Sequential Text-to-image Agent (PASTA) extends T2I models
with personalized multi-turn capabilities, fostering collaborative co-creation
and addressing uncertainty or underspecification in a user's intent. We
evaluate PASTA using human raters, showing significant improvement compared to
baseline methods. We also release our sequential rater dataset and simulated
user-rater interactions to support future research in personalized, multi-turn
T2I generation. |
---|---|
DOI: | 10.48550/arxiv.2412.10419 |