SmartControl: Enhancing ControlNet for Handling Rough Visual Conditions
Human visual imagination usually begins with analogies or rough sketches. For example, given an image with a girl playing guitar before a building, one may analogously imagine how it seems like if Iron Man playing guitar before Pyramid in Egypt. Nonetheless, visual condition may not be precisely ali...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Human visual imagination usually begins with analogies or rough sketches. For
example, given an image with a girl playing guitar before a building, one may
analogously imagine how it seems like if Iron Man playing guitar before Pyramid
in Egypt. Nonetheless, visual condition may not be precisely aligned with the
imaginary result indicated by text prompt, and existing layout-controllable
text-to-image (T2I) generation models is prone to producing degraded generated
results with obvious artifacts. To address this issue, we present a novel T2I
generation method dubbed SmartControl, which is designed to modify the rough
visual conditions for adapting to text prompt. The key idea of our SmartControl
is to relax the visual condition on the areas that are conflicted with text
prompts. In specific, a Control Scale Predictor (CSP) is designed to identify
the conflict regions and predict the local control scales, while a dataset with
text prompts and rough visual conditions is constructed for training CSP. It is
worth noting that, even with a limited number (e.g., 1,000~2,000) of training
samples, our SmartControl can generalize well to unseen objects. Extensive
experiments on four typical visual condition types clearly show the efficacy of
our SmartControl against state-of-the-arts. Source code, pre-trained models,
and datasets are available at https://github.com/liuxiaoyu1104/SmartControl. |
---|---|
DOI: | 10.48550/arxiv.2404.06451 |