MaskSketch: Unpaired Structure-guided Masked Image Generation
Recent conditional image generation methods produce images of remarkable diversity, fidelity and realism. However, the majority of these methods allow conditioning only on labels or text prompts, which limits their level of control over the generation result. In this paper, we introduce MaskSketch,...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recent conditional image generation methods produce images of remarkable
diversity, fidelity and realism. However, the majority of these methods allow
conditioning only on labels or text prompts, which limits their level of
control over the generation result. In this paper, we introduce MaskSketch, an
image generation method that allows spatial conditioning of the generation
result using a guiding sketch as an extra conditioning signal during sampling.
MaskSketch utilizes a pre-trained masked generative transformer, requiring no
model training or paired supervision, and works with input sketches of
different levels of abstraction. We show that intermediate self-attention maps
of a masked generative transformer encode important structural information of
the input image, such as scene layout and object shape, and we propose a novel
sampling method based on this observation to enable structure-guided
generation. Our results show that MaskSketch achieves high image realism and
fidelity to the guiding structure. Evaluated on standard benchmark datasets,
MaskSketch outperforms state-of-the-art methods for sketch-to-image
translation, as well as unpaired image-to-image translation approaches. |
---|---|
DOI: | 10.48550/arxiv.2302.05496 |