Controlling StyleGANs using rough scribbles via one‐shot learning

This paper tackles the challenging problem of one‐shot semantic image synthesis from rough sparse annotations, which we call “semantic scribbles.” Namely, from only a single training pair annotated with semantic scribbles, we generate realistic and diverse images with layout control over, for exampl...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computer animation and virtual worlds 2022-09, Vol.33 (5), p.n/a
Hauptverfasser:	Endo, Yuki, Kanamori, Yoshihiro
Format:	Artikel
Sprache:	eng
Schlagworte:	Annotations Coders GAN inversion generative adversarial networks image editing Image quality Layouts Optimization Random noise Semantics Synthesis Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper tackles the challenging problem of one‐shot semantic image synthesis from rough sparse annotations, which we call “semantic scribbles.” Namely, from only a single training pair annotated with semantic scribbles, we generate realistic and diverse images with layout control over, for example, facial part layouts and body poses. We present a training strategy that performs pseudo labeling for semantic scribbles using the StyleGAN prior. Our key idea is to construct a simple mapping between StyleGAN features and each semantic class from a single example of semantic scribbles. With such mappings, we can generate an unlimited number of pseudo semantic scribbles from random noise to train an encoder for controlling a pretrained StyleGAN generator. Even with our rough pseudo semantic scribbles obtained via one‐shot supervision, our method can synthesize high‐quality images thanks to our GAN inversion framework. We further offer optimization‐based postprocessing to refine the pixel alignment of synthesized images. Qualitative and quantitative results on various datasets demonstrate improvement over previous approaches in one‐shot settings. Our method can synthesize photorealistic images from rough semantic scribbles using a single training pair and a pretrained StyleGAN model.
ISSN:	1546-4261 1546-427X
DOI:	10.1002/cav.2102