FlowFace++: Explicit Semantic Flow-supervised End-to-End Face Swapping

This work proposes a novel face-swapping framework FlowFace++, utilizing explicit semantic flow supervision and end-to-end architecture to facilitate shape-aware face-swapping. Specifically, our work pretrains a facial shape discriminator to supervise the face swapping network. The discriminator is...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zhang, Yu, Zeng, Hao, Ma, Bowen, Zhang, Wei, Zhang, Zhimeng, Ding, Yu, Lv, Tangjie, Fan, Changjie
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Zhang, Yu Zeng, Hao Ma, Bowen Zhang, Wei Zhang, Zhimeng Ding, Yu Lv, Tangjie Fan, Changjie
description	This work proposes a novel face-swapping framework FlowFace++, utilizing explicit semantic flow supervision and end-to-end architecture to facilitate shape-aware face-swapping. Specifically, our work pretrains a facial shape discriminator to supervise the face swapping network. The discriminator is shape-aware and relies on a semantic flow-guided operation to explicitly calculate the shape discrepancies between the target and source faces, thus optimizing the face swapping network to generate highly realistic results. The face swapping network is a stack of a pre-trained face-masked autoencoder (MAE), a cross-attention fusion module, and a convolutional decoder. The MAE provides a fine-grained facial image representation space, which is unified for the target and source faces and thus facilitates final realistic results. The cross-attention fusion module carries out the source-to-target face swapping in a fine-grained latent space while preserving other attributes of the target image (e.g. expression, head pose, hair, background, illumination, etc). Lastly, the convolutional decoder further synthesizes the swapping results according to the face-swapping latent embedding from the cross-attention fusion module. Extensive quantitative and qualitative experiments on in-the-wild faces demonstrate that our FlowFace++ outperforms the state-of-the-art significantly, particularly while the source face is obstructed by uneven lighting or angle offset.
doi_str_mv	10.48550/arxiv.2306.12686
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2306_12686</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2306_12686</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-2daa7d858088bfe7f88393c80c6f79ba9121a45ec30b94fc44d54e224263f6c43</originalsourceid><addsrcrecordid>eNotjztPwzAURr0woNIfwIT3yqnjV27YUJUAUiWGdo9u_ECW0tRKQh__vqQwneHTd6RDyHPOMwVa8zUOl3jKhOQmy4UB80jqujuea7R-tXql1SV10caJ7vwB-ylaOq9s_El-OMXRO1r1jk1H9gs6n-jujCnF_vuJPATsRr_854Ls62q_-WDbr_fPzduWoSkMEw6xcKCBA7TBFwFAltICtyYUZYtlLnJU2lvJ21IFq5TTyguhhJHBWCUX5OVPew9p0hAPOFybOai5B8kbORhEuA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>FlowFace++: Explicit Semantic Flow-supervised End-to-End Face Swapping</title><source>arXiv.org</source><creator>Zhang, Yu ; Zeng, Hao ; Ma, Bowen ; Zhang, Wei ; Zhang, Zhimeng ; Ding, Yu ; Lv, Tangjie ; Fan, Changjie</creator><creatorcontrib>Zhang, Yu ; Zeng, Hao ; Ma, Bowen ; Zhang, Wei ; Zhang, Zhimeng ; Ding, Yu ; Lv, Tangjie ; Fan, Changjie</creatorcontrib><description>This work proposes a novel face-swapping framework FlowFace++, utilizing explicit semantic flow supervision and end-to-end architecture to facilitate shape-aware face-swapping. Specifically, our work pretrains a facial shape discriminator to supervise the face swapping network. The discriminator is shape-aware and relies on a semantic flow-guided operation to explicitly calculate the shape discrepancies between the target and source faces, thus optimizing the face swapping network to generate highly realistic results. The face swapping network is a stack of a pre-trained face-masked autoencoder (MAE), a cross-attention fusion module, and a convolutional decoder. The MAE provides a fine-grained facial image representation space, which is unified for the target and source faces and thus facilitates final realistic results. The cross-attention fusion module carries out the source-to-target face swapping in a fine-grained latent space while preserving other attributes of the target image (e.g. expression, head pose, hair, background, illumination, etc). Lastly, the convolutional decoder further synthesizes the swapping results according to the face-swapping latent embedding from the cross-attention fusion module. Extensive quantitative and qualitative experiments on in-the-wild faces demonstrate that our FlowFace++ outperforms the state-of-the-art significantly, particularly while the source face is obstructed by uneven lighting or angle offset.</description><identifier>DOI: 10.48550/arxiv.2306.12686</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2023-06</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2306.12686$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2306.12686$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhang, Yu</creatorcontrib><creatorcontrib>Zeng, Hao</creatorcontrib><creatorcontrib>Ma, Bowen</creatorcontrib><creatorcontrib>Zhang, Wei</creatorcontrib><creatorcontrib>Zhang, Zhimeng</creatorcontrib><creatorcontrib>Ding, Yu</creatorcontrib><creatorcontrib>Lv, Tangjie</creatorcontrib><creatorcontrib>Fan, Changjie</creatorcontrib><title>FlowFace++: Explicit Semantic Flow-supervised End-to-End Face Swapping</title><description>This work proposes a novel face-swapping framework FlowFace++, utilizing explicit semantic flow supervision and end-to-end architecture to facilitate shape-aware face-swapping. Specifically, our work pretrains a facial shape discriminator to supervise the face swapping network. The discriminator is shape-aware and relies on a semantic flow-guided operation to explicitly calculate the shape discrepancies between the target and source faces, thus optimizing the face swapping network to generate highly realistic results. The face swapping network is a stack of a pre-trained face-masked autoencoder (MAE), a cross-attention fusion module, and a convolutional decoder. The MAE provides a fine-grained facial image representation space, which is unified for the target and source faces and thus facilitates final realistic results. The cross-attention fusion module carries out the source-to-target face swapping in a fine-grained latent space while preserving other attributes of the target image (e.g. expression, head pose, hair, background, illumination, etc). Lastly, the convolutional decoder further synthesizes the swapping results according to the face-swapping latent embedding from the cross-attention fusion module. Extensive quantitative and qualitative experiments on in-the-wild faces demonstrate that our FlowFace++ outperforms the state-of-the-art significantly, particularly while the source face is obstructed by uneven lighting or angle offset.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotjztPwzAURr0woNIfwIT3yqnjV27YUJUAUiWGdo9u_ECW0tRKQh__vqQwneHTd6RDyHPOMwVa8zUOl3jKhOQmy4UB80jqujuea7R-tXql1SV10caJ7vwB-ylaOq9s_El-OMXRO1r1jk1H9gs6n-jujCnF_vuJPATsRr_854Ls62q_-WDbr_fPzduWoSkMEw6xcKCBA7TBFwFAltICtyYUZYtlLnJU2lvJ21IFq5TTyguhhJHBWCUX5OVPew9p0hAPOFybOai5B8kbORhEuA</recordid><startdate>20230622</startdate><enddate>20230622</enddate><creator>Zhang, Yu</creator><creator>Zeng, Hao</creator><creator>Ma, Bowen</creator><creator>Zhang, Wei</creator><creator>Zhang, Zhimeng</creator><creator>Ding, Yu</creator><creator>Lv, Tangjie</creator><creator>Fan, Changjie</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230622</creationdate><title>FlowFace++: Explicit Semantic Flow-supervised End-to-End Face Swapping</title><author>Zhang, Yu ; Zeng, Hao ; Ma, Bowen ; Zhang, Wei ; Zhang, Zhimeng ; Ding, Yu ; Lv, Tangjie ; Fan, Changjie</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-2daa7d858088bfe7f88393c80c6f79ba9121a45ec30b94fc44d54e224263f6c43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Yu</creatorcontrib><creatorcontrib>Zeng, Hao</creatorcontrib><creatorcontrib>Ma, Bowen</creatorcontrib><creatorcontrib>Zhang, Wei</creatorcontrib><creatorcontrib>Zhang, Zhimeng</creatorcontrib><creatorcontrib>Ding, Yu</creatorcontrib><creatorcontrib>Lv, Tangjie</creatorcontrib><creatorcontrib>Fan, Changjie</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Yu</au><au>Zeng, Hao</au><au>Ma, Bowen</au><au>Zhang, Wei</au><au>Zhang, Zhimeng</au><au>Ding, Yu</au><au>Lv, Tangjie</au><au>Fan, Changjie</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>FlowFace++: Explicit Semantic Flow-supervised End-to-End Face Swapping</atitle><date>2023-06-22</date><risdate>2023</risdate><abstract>This work proposes a novel face-swapping framework FlowFace++, utilizing explicit semantic flow supervision and end-to-end architecture to facilitate shape-aware face-swapping. Specifically, our work pretrains a facial shape discriminator to supervise the face swapping network. The discriminator is shape-aware and relies on a semantic flow-guided operation to explicitly calculate the shape discrepancies between the target and source faces, thus optimizing the face swapping network to generate highly realistic results. The face swapping network is a stack of a pre-trained face-masked autoencoder (MAE), a cross-attention fusion module, and a convolutional decoder. The MAE provides a fine-grained facial image representation space, which is unified for the target and source faces and thus facilitates final realistic results. The cross-attention fusion module carries out the source-to-target face swapping in a fine-grained latent space while preserving other attributes of the target image (e.g. expression, head pose, hair, background, illumination, etc). Lastly, the convolutional decoder further synthesizes the swapping results according to the face-swapping latent embedding from the cross-attention fusion module. Extensive quantitative and qualitative experiments on in-the-wild faces demonstrate that our FlowFace++ outperforms the state-of-the-art significantly, particularly while the source face is obstructed by uneven lighting or angle offset.</abstract><doi>10.48550/arxiv.2306.12686</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2306.12686
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2306_12686
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	FlowFace++: Explicit Semantic Flow-supervised End-to-End Face Swapping
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T13%3A09%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=FlowFace++:%20Explicit%20Semantic%20Flow-supervised%20End-to-End%20Face%20Swapping&rft.au=Zhang,%20Yu&rft.date=2023-06-22&rft_id=info:doi/10.48550/arxiv.2306.12686&rft_dat=%3Carxiv_GOX%3E2306_12686%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true