DiCTI: Diffusion-based Clothing Designer via Text-guided Input
Recent developments in deep generative models have opened up a wide range of opportunities for image synthesis, leading to significant changes in various creative fields, including the fashion industry. While numerous methods have been proposed to benefit buyers, particularly in virtual try-on appli...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recent developments in deep generative models have opened up a wide range of
opportunities for image synthesis, leading to significant changes in various
creative fields, including the fashion industry. While numerous methods have
been proposed to benefit buyers, particularly in virtual try-on applications,
there has been relatively less focus on facilitating fast prototyping for
designers and customers seeking to order new designs. To address this gap, we
introduce DiCTI (Diffusion-based Clothing Designer via Text-guided Input), a
straightforward yet highly effective approach that allows designers to quickly
visualize fashion-related ideas using text inputs only. Given an image of a
person and a description of the desired garments as input, DiCTI automatically
generates multiple high-resolution, photorealistic images that capture the
expressed semantics. By leveraging a powerful diffusion-based inpainting model
conditioned on text inputs, DiCTI is able to synthesize convincing,
high-quality images with varied clothing designs that viably follow the
provided text descriptions, while being able to process very diverse and
challenging inputs, captured in completely unconstrained settings. We evaluate
DiCTI in comprehensive experiments on two different datasets (VITON-HD and
Fashionpedia) and in comparison to the state-of-the-art (SoTa). The results of
our experiments show that DiCTI convincingly outperforms the SoTA competitor in
generating higher quality images with more elaborate garments and superior text
prompt adherence, both according to standard quantitative evaluation measures
and human ratings, generated as part of a user study. |
---|---|
DOI: | 10.48550/arxiv.2407.03901 |