ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model
Stylized Text-to-Image Generation (STIG) aims to generate images from text prompts and style reference images. In this paper, we present ArtWeaver, a novel framework that leverages pretrained Stable Diffusion (SD) to address challenges such as misinterpreted styles and inconsistent semantics. Our ap...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Stylized Text-to-Image Generation (STIG) aims to generate images from text
prompts and style reference images. In this paper, we present ArtWeaver, a
novel framework that leverages pretrained Stable Diffusion (SD) to address
challenges such as misinterpreted styles and inconsistent semantics. Our
approach introduces two innovative modules: the mixed style descriptor and the
dynamic attention adapter. The mixed style descriptor enhances SD by combining
content-aware and frequency-disentangled embeddings from CLIP with additional
sources that capture global statistics and textual information, thus providing
a richer blend of style-related and semantic-related knowledge. To achieve a
better balance between adapter capacity and semantic control, the dynamic
attention adapter is integrated into the diffusion UNet, dynamically
calculating adaptation weights based on the style descriptors. Additionally, we
introduce two objective functions to optimize the model alongside the denoising
loss, further enhancing semantic and style consistency. Extensive experiments
demonstrate the superiority of ArtWeaver over existing methods, producing
images with diverse target styles while maintaining the semantic integrity of
the text prompts. |
---|---|
DOI: | 10.48550/arxiv.2405.15287 |