StyleFusion: A Generative Model for Disentangling Spatial Segments

We present StyleFusion, a new mapping architecture for StyleGAN, which takes as input a number of latent codes and fuses them into a single style code. Inserting the resulting style code into a pre-trained StyleGAN generator results in a single harmonized image in which each semantic region is contr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Kafri, Omer, Patashnik, Or, Alaluf, Yuval, Cohen-Or, Daniel
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Kafri, Omer Patashnik, Or Alaluf, Yuval Cohen-Or, Daniel
description	We present StyleFusion, a new mapping architecture for StyleGAN, which takes as input a number of latent codes and fuses them into a single style code. Inserting the resulting style code into a pre-trained StyleGAN generator results in a single harmonized image in which each semantic region is controlled by one of the input latent codes. Effectively, StyleFusion yields a disentangled representation of the image, providing fine-grained control over each region of the generated image. Moreover, to help facilitate global control over the generated image, a special input latent code is incorporated into the fused representation. StyleFusion operates in a hierarchical manner, where each level is tasked with learning to disentangle a pair of image regions (e.g., the car body and wheels). The resulting learned disentanglement allows one to modify both local, fine-grained semantics (e.g., facial features) as well as more global features (e.g., pose and background), providing improved flexibility in the synthesis process. As a natural extension, StyleFusion enables one to perform semantically-aware cross-image mixing of regions that are not necessarily aligned. Finally, we demonstrate how StyleFusion can be paired with existing editing techniques to more faithfully constrain the edit to the user's region of interest.
doi_str_mv	10.48550/arxiv.2107.07437
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2107_07437</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2107_07437</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-52906dd8c4c045b2bf465778f5c74ac5b4c27fb5d87f1e0a34bc010c76e3dbf13</originalsourceid><addsrcrecordid>eNotz8tOwzAUBFBvukCFD2CFfyDBju3cwK4ttCAVsUj3kR_XkSXXqZxQ0b_vA1YjzUgjHUIeOStloxR71vk3HMuKMygZSAF3ZNlOp4jrnzEM6ZUu6AYTZj2FI9KvwWGkfsj0LYyYJp36GFJP28Nl15G22O8v9XhPZl7HER_-c0526_fd6qPYfm8-V4ttoWuAQlUvrHausdIyqUxlvKwVQOOVBamtMtJW4I1yDXiOTAtpLOPMQo3CGc_FnDz93d4Q3SGHvc6n7orpbhhxBpbQRKI</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>StyleFusion: A Generative Model for Disentangling Spatial Segments</title><source>arXiv.org</source><creator>Kafri, Omer ; Patashnik, Or ; Alaluf, Yuval ; Cohen-Or, Daniel</creator><creatorcontrib>Kafri, Omer ; Patashnik, Or ; Alaluf, Yuval ; Cohen-Or, Daniel</creatorcontrib><description>We present StyleFusion, a new mapping architecture for StyleGAN, which takes as input a number of latent codes and fuses them into a single style code. Inserting the resulting style code into a pre-trained StyleGAN generator results in a single harmonized image in which each semantic region is controlled by one of the input latent codes. Effectively, StyleFusion yields a disentangled representation of the image, providing fine-grained control over each region of the generated image. Moreover, to help facilitate global control over the generated image, a special input latent code is incorporated into the fused representation. StyleFusion operates in a hierarchical manner, where each level is tasked with learning to disentangle a pair of image regions (e.g., the car body and wheels). The resulting learned disentanglement allows one to modify both local, fine-grained semantics (e.g., facial features) as well as more global features (e.g., pose and background), providing improved flexibility in the synthesis process. As a natural extension, StyleFusion enables one to perform semantically-aware cross-image mixing of regions that are not necessarily aligned. Finally, we demonstrate how StyleFusion can be paired with existing editing techniques to more faithfully constrain the edit to the user's region of interest.</description><identifier>DOI: 10.48550/arxiv.2107.07437</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2021-07</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2107.07437$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2107.07437$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Kafri, Omer</creatorcontrib><creatorcontrib>Patashnik, Or</creatorcontrib><creatorcontrib>Alaluf, Yuval</creatorcontrib><creatorcontrib>Cohen-Or, Daniel</creatorcontrib><title>StyleFusion: A Generative Model for Disentangling Spatial Segments</title><description>We present StyleFusion, a new mapping architecture for StyleGAN, which takes as input a number of latent codes and fuses them into a single style code. Inserting the resulting style code into a pre-trained StyleGAN generator results in a single harmonized image in which each semantic region is controlled by one of the input latent codes. Effectively, StyleFusion yields a disentangled representation of the image, providing fine-grained control over each region of the generated image. Moreover, to help facilitate global control over the generated image, a special input latent code is incorporated into the fused representation. StyleFusion operates in a hierarchical manner, where each level is tasked with learning to disentangle a pair of image regions (e.g., the car body and wheels). The resulting learned disentanglement allows one to modify both local, fine-grained semantics (e.g., facial features) as well as more global features (e.g., pose and background), providing improved flexibility in the synthesis process. As a natural extension, StyleFusion enables one to perform semantically-aware cross-image mixing of regions that are not necessarily aligned. Finally, we demonstrate how StyleFusion can be paired with existing editing techniques to more faithfully constrain the edit to the user's region of interest.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz8tOwzAUBFBvukCFD2CFfyDBju3cwK4ttCAVsUj3kR_XkSXXqZxQ0b_vA1YjzUgjHUIeOStloxR71vk3HMuKMygZSAF3ZNlOp4jrnzEM6ZUu6AYTZj2FI9KvwWGkfsj0LYyYJp36GFJP28Nl15G22O8v9XhPZl7HER_-c0526_fd6qPYfm8-V4ttoWuAQlUvrHausdIyqUxlvKwVQOOVBamtMtJW4I1yDXiOTAtpLOPMQo3CGc_FnDz93d4Q3SGHvc6n7orpbhhxBpbQRKI</recordid><startdate>20210715</startdate><enddate>20210715</enddate><creator>Kafri, Omer</creator><creator>Patashnik, Or</creator><creator>Alaluf, Yuval</creator><creator>Cohen-Or, Daniel</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20210715</creationdate><title>StyleFusion: A Generative Model for Disentangling Spatial Segments</title><author>Kafri, Omer ; Patashnik, Or ; Alaluf, Yuval ; Cohen-Or, Daniel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-52906dd8c4c045b2bf465778f5c74ac5b4c27fb5d87f1e0a34bc010c76e3dbf13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Kafri, Omer</creatorcontrib><creatorcontrib>Patashnik, Or</creatorcontrib><creatorcontrib>Alaluf, Yuval</creatorcontrib><creatorcontrib>Cohen-Or, Daniel</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kafri, Omer</au><au>Patashnik, Or</au><au>Alaluf, Yuval</au><au>Cohen-Or, Daniel</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>StyleFusion: A Generative Model for Disentangling Spatial Segments</atitle><date>2021-07-15</date><risdate>2021</risdate><abstract>We present StyleFusion, a new mapping architecture for StyleGAN, which takes as input a number of latent codes and fuses them into a single style code. Inserting the resulting style code into a pre-trained StyleGAN generator results in a single harmonized image in which each semantic region is controlled by one of the input latent codes. Effectively, StyleFusion yields a disentangled representation of the image, providing fine-grained control over each region of the generated image. Moreover, to help facilitate global control over the generated image, a special input latent code is incorporated into the fused representation. StyleFusion operates in a hierarchical manner, where each level is tasked with learning to disentangle a pair of image regions (e.g., the car body and wheels). The resulting learned disentanglement allows one to modify both local, fine-grained semantics (e.g., facial features) as well as more global features (e.g., pose and background), providing improved flexibility in the synthesis process. As a natural extension, StyleFusion enables one to perform semantically-aware cross-image mixing of regions that are not necessarily aligned. Finally, we demonstrate how StyleFusion can be paired with existing editing techniques to more faithfully constrain the edit to the user's region of interest.</abstract><doi>10.48550/arxiv.2107.07437</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2107.07437
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2107_07437
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	StyleFusion: A Generative Model for Disentangling Spatial Segments
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T19%3A28%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=StyleFusion:%20A%20Generative%20Model%20for%20Disentangling%20Spatial%20Segments&rft.au=Kafri,%20Omer&rft.date=2021-07-15&rft_id=info:doi/10.48550/arxiv.2107.07437&rft_dat=%3Carxiv_GOX%3E2107_07437%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true