Designing an Encoder for StyleGAN Image Manipulation

Recently, there has been a surge of diverse methods for performing image editing by employing pre-trained unconditional generators. Applying these methods on real images, however, remains a challenge, as it necessarily requires the inversion of the images into their latent space. To successfully inv...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Tov, Omer, Alaluf, Yuval, Nitzan, Yotam, Patashnik, Or, Cohen-Or, Daniel
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Tov, Omer Alaluf, Yuval Nitzan, Yotam Patashnik, Or Cohen-Or, Daniel
description	Recently, there has been a surge of diverse methods for performing image editing by employing pre-trained unconditional generators. Applying these methods on real images, however, remains a challenge, as it necessarily requires the inversion of the images into their latent space. To successfully invert a real image, one needs to find a latent code that reconstructs the input image accurately, and more importantly, allows for its meaningful manipulation. In this paper, we carefully study the latent space of StyleGAN, the state-of-the-art unconditional generator. We identify and analyze the existence of a distortion-editability tradeoff and a distortion-perception tradeoff within the StyleGAN latent space. We then suggest two principles for designing encoders in a manner that allows one to control the proximity of the inversions to regions that StyleGAN was originally trained on. We present an encoder based on our two principles that is specifically designed for facilitating editing on real images by balancing these tradeoffs. By evaluating its performance qualitatively and quantitatively on numerous challenging domains, including cars and horses, we show that our inversion method, followed by common editing techniques, achieves superior real-image editing quality, with only a small reconstruction accuracy drop.
doi_str_mv	10.48550/arxiv.2102.02766
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2102_02766</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2102_02766</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-af46e3a8fc5b5f2b9aba02a06b9de8dd27d63f3f60c7cf0e9127047d39eeaa8c3</originalsourceid><addsrcrecordid>eNotzruOwjAQQFE3FAj2A6jwDyRM7MROSgQsIPEooI8m9jiyFBxk2NXy9yse1e2uDmOTDNK8LAqYYfzzv6nIQKQgtFJDli_p5tvgQ8sx8FUwvaXIXR_56f7oaD0_8O0FW-J7DP760-Hd92HMBg67G319OmLn79V5sUl2x_V2Md8lqLRK0OWKJJbOFE3hRFNhgyAQVFNZKq0V2irppFNgtHFAVSY05NrKigixNHLEpu_ti11fo79gfNRPfv3iy39LLT_r</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Designing an Encoder for StyleGAN Image Manipulation</title><source>arXiv.org</source><creator>Tov, Omer ; Alaluf, Yuval ; Nitzan, Yotam ; Patashnik, Or ; Cohen-Or, Daniel</creator><creatorcontrib>Tov, Omer ; Alaluf, Yuval ; Nitzan, Yotam ; Patashnik, Or ; Cohen-Or, Daniel</creatorcontrib><description>Recently, there has been a surge of diverse methods for performing image editing by employing pre-trained unconditional generators. Applying these methods on real images, however, remains a challenge, as it necessarily requires the inversion of the images into their latent space. To successfully invert a real image, one needs to find a latent code that reconstructs the input image accurately, and more importantly, allows for its meaningful manipulation. In this paper, we carefully study the latent space of StyleGAN, the state-of-the-art unconditional generator. We identify and analyze the existence of a distortion-editability tradeoff and a distortion-perception tradeoff within the StyleGAN latent space. We then suggest two principles for designing encoders in a manner that allows one to control the proximity of the inversions to regions that StyleGAN was originally trained on. We present an encoder based on our two principles that is specifically designed for facilitating editing on real images by balancing these tradeoffs. By evaluating its performance qualitatively and quantitatively on numerous challenging domains, including cars and horses, we show that our inversion method, followed by common editing techniques, achieves superior real-image editing quality, with only a small reconstruction accuracy drop.</description><identifier>DOI: 10.48550/arxiv.2102.02766</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2021-02</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2102.02766$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2102.02766$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Tov, Omer</creatorcontrib><creatorcontrib>Alaluf, Yuval</creatorcontrib><creatorcontrib>Nitzan, Yotam</creatorcontrib><creatorcontrib>Patashnik, Or</creatorcontrib><creatorcontrib>Cohen-Or, Daniel</creatorcontrib><title>Designing an Encoder for StyleGAN Image Manipulation</title><description>Recently, there has been a surge of diverse methods for performing image editing by employing pre-trained unconditional generators. Applying these methods on real images, however, remains a challenge, as it necessarily requires the inversion of the images into their latent space. To successfully invert a real image, one needs to find a latent code that reconstructs the input image accurately, and more importantly, allows for its meaningful manipulation. In this paper, we carefully study the latent space of StyleGAN, the state-of-the-art unconditional generator. We identify and analyze the existence of a distortion-editability tradeoff and a distortion-perception tradeoff within the StyleGAN latent space. We then suggest two principles for designing encoders in a manner that allows one to control the proximity of the inversions to regions that StyleGAN was originally trained on. We present an encoder based on our two principles that is specifically designed for facilitating editing on real images by balancing these tradeoffs. By evaluating its performance qualitatively and quantitatively on numerous challenging domains, including cars and horses, we show that our inversion method, followed by common editing techniques, achieves superior real-image editing quality, with only a small reconstruction accuracy drop.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzruOwjAQQFE3FAj2A6jwDyRM7MROSgQsIPEooI8m9jiyFBxk2NXy9yse1e2uDmOTDNK8LAqYYfzzv6nIQKQgtFJDli_p5tvgQ8sx8FUwvaXIXR_56f7oaD0_8O0FW-J7DP760-Hd92HMBg67G319OmLn79V5sUl2x_V2Md8lqLRK0OWKJJbOFE3hRFNhgyAQVFNZKq0V2irppFNgtHFAVSY05NrKigixNHLEpu_ti11fo79gfNRPfv3iy39LLT_r</recordid><startdate>20210204</startdate><enddate>20210204</enddate><creator>Tov, Omer</creator><creator>Alaluf, Yuval</creator><creator>Nitzan, Yotam</creator><creator>Patashnik, Or</creator><creator>Cohen-Or, Daniel</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20210204</creationdate><title>Designing an Encoder for StyleGAN Image Manipulation</title><author>Tov, Omer ; Alaluf, Yuval ; Nitzan, Yotam ; Patashnik, Or ; Cohen-Or, Daniel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-af46e3a8fc5b5f2b9aba02a06b9de8dd27d63f3f60c7cf0e9127047d39eeaa8c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Tov, Omer</creatorcontrib><creatorcontrib>Alaluf, Yuval</creatorcontrib><creatorcontrib>Nitzan, Yotam</creatorcontrib><creatorcontrib>Patashnik, Or</creatorcontrib><creatorcontrib>Cohen-Or, Daniel</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Tov, Omer</au><au>Alaluf, Yuval</au><au>Nitzan, Yotam</au><au>Patashnik, Or</au><au>Cohen-Or, Daniel</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Designing an Encoder for StyleGAN Image Manipulation</atitle><date>2021-02-04</date><risdate>2021</risdate><abstract>Recently, there has been a surge of diverse methods for performing image editing by employing pre-trained unconditional generators. Applying these methods on real images, however, remains a challenge, as it necessarily requires the inversion of the images into their latent space. To successfully invert a real image, one needs to find a latent code that reconstructs the input image accurately, and more importantly, allows for its meaningful manipulation. In this paper, we carefully study the latent space of StyleGAN, the state-of-the-art unconditional generator. We identify and analyze the existence of a distortion-editability tradeoff and a distortion-perception tradeoff within the StyleGAN latent space. We then suggest two principles for designing encoders in a manner that allows one to control the proximity of the inversions to regions that StyleGAN was originally trained on. We present an encoder based on our two principles that is specifically designed for facilitating editing on real images by balancing these tradeoffs. By evaluating its performance qualitatively and quantitatively on numerous challenging domains, including cars and horses, we show that our inversion method, followed by common editing techniques, achieves superior real-image editing quality, with only a small reconstruction accuracy drop.</abstract><doi>10.48550/arxiv.2102.02766</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2102.02766
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2102_02766
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	Designing an Encoder for StyleGAN Image Manipulation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T01%3A39%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Designing%20an%20Encoder%20for%20StyleGAN%20Image%20Manipulation&rft.au=Tov,%20Omer&rft.date=2021-02-04&rft_id=info:doi/10.48550/arxiv.2102.02766&rft_dat=%3Carxiv_GOX%3E2102_02766%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true