StyleFusion: A Generative Model for Disentangling Spatial Segments

We present StyleFusion, a new mapping architecture for StyleGAN, which takes as input a number of latent codes and fuses them into a single style code. Inserting the resulting style code into a pre-trained StyleGAN generator results in a single harmonized image in which each semantic region is contr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Kafri, Omer, Patashnik, Or, Alaluf, Yuval, Cohen-Or, Daniel
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Kafri, Omer
Patashnik, Or
Alaluf, Yuval
Cohen-Or, Daniel
description We present StyleFusion, a new mapping architecture for StyleGAN, which takes as input a number of latent codes and fuses them into a single style code. Inserting the resulting style code into a pre-trained StyleGAN generator results in a single harmonized image in which each semantic region is controlled by one of the input latent codes. Effectively, StyleFusion yields a disentangled representation of the image, providing fine-grained control over each region of the generated image. Moreover, to help facilitate global control over the generated image, a special input latent code is incorporated into the fused representation. StyleFusion operates in a hierarchical manner, where each level is tasked with learning to disentangle a pair of image regions (e.g., the car body and wheels). The resulting learned disentanglement allows one to modify both local, fine-grained semantics (e.g., facial features) as well as more global features (e.g., pose and background), providing improved flexibility in the synthesis process. As a natural extension, StyleFusion enables one to perform semantically-aware cross-image mixing of regions that are not necessarily aligned. Finally, we demonstrate how StyleFusion can be paired with existing editing techniques to more faithfully constrain the edit to the user's region of interest.
doi_str_mv 10.48550/arxiv.2107.07437
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2107_07437</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2107_07437</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-52906dd8c4c045b2bf465778f5c74ac5b4c27fb5d87f1e0a34bc010c76e3dbf13</originalsourceid><addsrcrecordid>eNotz8tOwzAUBFBvukCFD2CFfyDBju3cwK4ttCAVsUj3kR_XkSXXqZxQ0b_vA1YjzUgjHUIeOStloxR71vk3HMuKMygZSAF3ZNlOp4jrnzEM6ZUu6AYTZj2FI9KvwWGkfsj0LYyYJp36GFJP28Nl15G22O8v9XhPZl7HER_-c0526_fd6qPYfm8-V4ttoWuAQlUvrHausdIyqUxlvKwVQOOVBamtMtJW4I1yDXiOTAtpLOPMQo3CGc_FnDz93d4Q3SGHvc6n7orpbhhxBpbQRKI</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>StyleFusion: A Generative Model for Disentangling Spatial Segments</title><source>arXiv.org</source><creator>Kafri, Omer ; Patashnik, Or ; Alaluf, Yuval ; Cohen-Or, Daniel</creator><creatorcontrib>Kafri, Omer ; Patashnik, Or ; Alaluf, Yuval ; Cohen-Or, Daniel</creatorcontrib><description>We present StyleFusion, a new mapping architecture for StyleGAN, which takes as input a number of latent codes and fuses them into a single style code. Inserting the resulting style code into a pre-trained StyleGAN generator results in a single harmonized image in which each semantic region is controlled by one of the input latent codes. Effectively, StyleFusion yields a disentangled representation of the image, providing fine-grained control over each region of the generated image. Moreover, to help facilitate global control over the generated image, a special input latent code is incorporated into the fused representation. StyleFusion operates in a hierarchical manner, where each level is tasked with learning to disentangle a pair of image regions (e.g., the car body and wheels). The resulting learned disentanglement allows one to modify both local, fine-grained semantics (e.g., facial features) as well as more global features (e.g., pose and background), providing improved flexibility in the synthesis process. As a natural extension, StyleFusion enables one to perform semantically-aware cross-image mixing of regions that are not necessarily aligned. Finally, we demonstrate how StyleFusion can be paired with existing editing techniques to more faithfully constrain the edit to the user's region of interest.</description><identifier>DOI: 10.48550/arxiv.2107.07437</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2021-07</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2107.07437$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2107.07437$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Kafri, Omer</creatorcontrib><creatorcontrib>Patashnik, Or</creatorcontrib><creatorcontrib>Alaluf, Yuval</creatorcontrib><creatorcontrib>Cohen-Or, Daniel</creatorcontrib><title>StyleFusion: A Generative Model for Disentangling Spatial Segments</title><description>We present StyleFusion, a new mapping architecture for StyleGAN, which takes as input a number of latent codes and fuses them into a single style code. Inserting the resulting style code into a pre-trained StyleGAN generator results in a single harmonized image in which each semantic region is controlled by one of the input latent codes. Effectively, StyleFusion yields a disentangled representation of the image, providing fine-grained control over each region of the generated image. Moreover, to help facilitate global control over the generated image, a special input latent code is incorporated into the fused representation. StyleFusion operates in a hierarchical manner, where each level is tasked with learning to disentangle a pair of image regions (e.g., the car body and wheels). The resulting learned disentanglement allows one to modify both local, fine-grained semantics (e.g., facial features) as well as more global features (e.g., pose and background), providing improved flexibility in the synthesis process. As a natural extension, StyleFusion enables one to perform semantically-aware cross-image mixing of regions that are not necessarily aligned. Finally, we demonstrate how StyleFusion can be paired with existing editing techniques to more faithfully constrain the edit to the user's region of interest.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz8tOwzAUBFBvukCFD2CFfyDBju3cwK4ttCAVsUj3kR_XkSXXqZxQ0b_vA1YjzUgjHUIeOStloxR71vk3HMuKMygZSAF3ZNlOp4jrnzEM6ZUu6AYTZj2FI9KvwWGkfsj0LYyYJp36GFJP28Nl15G22O8v9XhPZl7HER_-c0526_fd6qPYfm8-V4ttoWuAQlUvrHausdIyqUxlvKwVQOOVBamtMtJW4I1yDXiOTAtpLOPMQo3CGc_FnDz93d4Q3SGHvc6n7orpbhhxBpbQRKI</recordid><startdate>20210715</startdate><enddate>20210715</enddate><creator>Kafri, Omer</creator><creator>Patashnik, Or</creator><creator>Alaluf, Yuval</creator><creator>Cohen-Or, Daniel</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20210715</creationdate><title>StyleFusion: A Generative Model for Disentangling Spatial Segments</title><author>Kafri, Omer ; Patashnik, Or ; Alaluf, Yuval ; Cohen-Or, Daniel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-52906dd8c4c045b2bf465778f5c74ac5b4c27fb5d87f1e0a34bc010c76e3dbf13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Kafri, Omer</creatorcontrib><creatorcontrib>Patashnik, Or</creatorcontrib><creatorcontrib>Alaluf, Yuval</creatorcontrib><creatorcontrib>Cohen-Or, Daniel</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kafri, Omer</au><au>Patashnik, Or</au><au>Alaluf, Yuval</au><au>Cohen-Or, Daniel</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>StyleFusion: A Generative Model for Disentangling Spatial Segments</atitle><date>2021-07-15</date><risdate>2021</risdate><abstract>We present StyleFusion, a new mapping architecture for StyleGAN, which takes as input a number of latent codes and fuses them into a single style code. Inserting the resulting style code into a pre-trained StyleGAN generator results in a single harmonized image in which each semantic region is controlled by one of the input latent codes. Effectively, StyleFusion yields a disentangled representation of the image, providing fine-grained control over each region of the generated image. Moreover, to help facilitate global control over the generated image, a special input latent code is incorporated into the fused representation. StyleFusion operates in a hierarchical manner, where each level is tasked with learning to disentangle a pair of image regions (e.g., the car body and wheels). The resulting learned disentanglement allows one to modify both local, fine-grained semantics (e.g., facial features) as well as more global features (e.g., pose and background), providing improved flexibility in the synthesis process. As a natural extension, StyleFusion enables one to perform semantically-aware cross-image mixing of regions that are not necessarily aligned. Finally, we demonstrate how StyleFusion can be paired with existing editing techniques to more faithfully constrain the edit to the user's region of interest.</abstract><doi>10.48550/arxiv.2107.07437</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2107.07437
ispartof
issn
language eng
recordid cdi_arxiv_primary_2107_07437
source arXiv.org
subjects Computer Science - Computer Vision and Pattern Recognition
title StyleFusion: A Generative Model for Disentangling Spatial Segments
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T19%3A28%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=StyleFusion:%20A%20Generative%20Model%20for%20Disentangling%20Spatial%20Segments&rft.au=Kafri,%20Omer&rft.date=2021-07-15&rft_id=info:doi/10.48550/arxiv.2107.07437&rft_dat=%3Carxiv_GOX%3E2107_07437%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true