Designing an Encoder for StyleGAN Image Manipulation
Recently, there has been a surge of diverse methods for performing image editing by employing pre-trained unconditional generators. Applying these methods on real images, however, remains a challenge, as it necessarily requires the inversion of the images into their latent space. To successfully inv...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recently, there has been a surge of diverse methods for performing image
editing by employing pre-trained unconditional generators. Applying these
methods on real images, however, remains a challenge, as it necessarily
requires the inversion of the images into their latent space. To successfully
invert a real image, one needs to find a latent code that reconstructs the
input image accurately, and more importantly, allows for its meaningful
manipulation. In this paper, we carefully study the latent space of StyleGAN,
the state-of-the-art unconditional generator. We identify and analyze the
existence of a distortion-editability tradeoff and a distortion-perception
tradeoff within the StyleGAN latent space. We then suggest two principles for
designing encoders in a manner that allows one to control the proximity of the
inversions to regions that StyleGAN was originally trained on. We present an
encoder based on our two principles that is specifically designed for
facilitating editing on real images by balancing these tradeoffs. By evaluating
its performance qualitatively and quantitatively on numerous challenging
domains, including cars and horses, we show that our inversion method, followed
by common editing techniques, achieves superior real-image editing quality,
with only a small reconstruction accuracy drop. |
---|---|
DOI: | 10.48550/arxiv.2102.02766 |