FlowFace++: Explicit Semantic Flow-supervised End-to-End Face Swapping

This work proposes a novel face-swapping framework FlowFace++, utilizing explicit semantic flow supervision and end-to-end architecture to facilitate shape-aware face-swapping. Specifically, our work pretrains a facial shape discriminator to supervise the face swapping network. The discriminator is...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Zhang, Yu, Zeng, Hao, Ma, Bowen, Zhang, Wei, Zhang, Zhimeng, Ding, Yu, Lv, Tangjie, Fan, Changjie
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Zhang, Yu
Zeng, Hao
Ma, Bowen
Zhang, Wei
Zhang, Zhimeng
Ding, Yu
Lv, Tangjie
Fan, Changjie
description This work proposes a novel face-swapping framework FlowFace++, utilizing explicit semantic flow supervision and end-to-end architecture to facilitate shape-aware face-swapping. Specifically, our work pretrains a facial shape discriminator to supervise the face swapping network. The discriminator is shape-aware and relies on a semantic flow-guided operation to explicitly calculate the shape discrepancies between the target and source faces, thus optimizing the face swapping network to generate highly realistic results. The face swapping network is a stack of a pre-trained face-masked autoencoder (MAE), a cross-attention fusion module, and a convolutional decoder. The MAE provides a fine-grained facial image representation space, which is unified for the target and source faces and thus facilitates final realistic results. The cross-attention fusion module carries out the source-to-target face swapping in a fine-grained latent space while preserving other attributes of the target image (e.g. expression, head pose, hair, background, illumination, etc). Lastly, the convolutional decoder further synthesizes the swapping results according to the face-swapping latent embedding from the cross-attention fusion module. Extensive quantitative and qualitative experiments on in-the-wild faces demonstrate that our FlowFace++ outperforms the state-of-the-art significantly, particularly while the source face is obstructed by uneven lighting or angle offset.
doi_str_mv 10.48550/arxiv.2306.12686
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2306_12686</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2306_12686</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-2daa7d858088bfe7f88393c80c6f79ba9121a45ec30b94fc44d54e224263f6c43</originalsourceid><addsrcrecordid>eNotjztPwzAURr0woNIfwIT3yqnjV27YUJUAUiWGdo9u_ECW0tRKQh__vqQwneHTd6RDyHPOMwVa8zUOl3jKhOQmy4UB80jqujuea7R-tXql1SV10caJ7vwB-ylaOq9s_El-OMXRO1r1jk1H9gs6n-jujCnF_vuJPATsRr_854Ls62q_-WDbr_fPzduWoSkMEw6xcKCBA7TBFwFAltICtyYUZYtlLnJU2lvJ21IFq5TTyguhhJHBWCUX5OVPew9p0hAPOFybOai5B8kbORhEuA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>FlowFace++: Explicit Semantic Flow-supervised End-to-End Face Swapping</title><source>arXiv.org</source><creator>Zhang, Yu ; Zeng, Hao ; Ma, Bowen ; Zhang, Wei ; Zhang, Zhimeng ; Ding, Yu ; Lv, Tangjie ; Fan, Changjie</creator><creatorcontrib>Zhang, Yu ; Zeng, Hao ; Ma, Bowen ; Zhang, Wei ; Zhang, Zhimeng ; Ding, Yu ; Lv, Tangjie ; Fan, Changjie</creatorcontrib><description>This work proposes a novel face-swapping framework FlowFace++, utilizing explicit semantic flow supervision and end-to-end architecture to facilitate shape-aware face-swapping. Specifically, our work pretrains a facial shape discriminator to supervise the face swapping network. The discriminator is shape-aware and relies on a semantic flow-guided operation to explicitly calculate the shape discrepancies between the target and source faces, thus optimizing the face swapping network to generate highly realistic results. The face swapping network is a stack of a pre-trained face-masked autoencoder (MAE), a cross-attention fusion module, and a convolutional decoder. The MAE provides a fine-grained facial image representation space, which is unified for the target and source faces and thus facilitates final realistic results. The cross-attention fusion module carries out the source-to-target face swapping in a fine-grained latent space while preserving other attributes of the target image (e.g. expression, head pose, hair, background, illumination, etc). Lastly, the convolutional decoder further synthesizes the swapping results according to the face-swapping latent embedding from the cross-attention fusion module. Extensive quantitative and qualitative experiments on in-the-wild faces demonstrate that our FlowFace++ outperforms the state-of-the-art significantly, particularly while the source face is obstructed by uneven lighting or angle offset.</description><identifier>DOI: 10.48550/arxiv.2306.12686</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2023-06</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2306.12686$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2306.12686$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhang, Yu</creatorcontrib><creatorcontrib>Zeng, Hao</creatorcontrib><creatorcontrib>Ma, Bowen</creatorcontrib><creatorcontrib>Zhang, Wei</creatorcontrib><creatorcontrib>Zhang, Zhimeng</creatorcontrib><creatorcontrib>Ding, Yu</creatorcontrib><creatorcontrib>Lv, Tangjie</creatorcontrib><creatorcontrib>Fan, Changjie</creatorcontrib><title>FlowFace++: Explicit Semantic Flow-supervised End-to-End Face Swapping</title><description>This work proposes a novel face-swapping framework FlowFace++, utilizing explicit semantic flow supervision and end-to-end architecture to facilitate shape-aware face-swapping. Specifically, our work pretrains a facial shape discriminator to supervise the face swapping network. The discriminator is shape-aware and relies on a semantic flow-guided operation to explicitly calculate the shape discrepancies between the target and source faces, thus optimizing the face swapping network to generate highly realistic results. The face swapping network is a stack of a pre-trained face-masked autoencoder (MAE), a cross-attention fusion module, and a convolutional decoder. The MAE provides a fine-grained facial image representation space, which is unified for the target and source faces and thus facilitates final realistic results. The cross-attention fusion module carries out the source-to-target face swapping in a fine-grained latent space while preserving other attributes of the target image (e.g. expression, head pose, hair, background, illumination, etc). Lastly, the convolutional decoder further synthesizes the swapping results according to the face-swapping latent embedding from the cross-attention fusion module. Extensive quantitative and qualitative experiments on in-the-wild faces demonstrate that our FlowFace++ outperforms the state-of-the-art significantly, particularly while the source face is obstructed by uneven lighting or angle offset.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotjztPwzAURr0woNIfwIT3yqnjV27YUJUAUiWGdo9u_ECW0tRKQh__vqQwneHTd6RDyHPOMwVa8zUOl3jKhOQmy4UB80jqujuea7R-tXql1SV10caJ7vwB-ylaOq9s_El-OMXRO1r1jk1H9gs6n-jujCnF_vuJPATsRr_854Ls62q_-WDbr_fPzduWoSkMEw6xcKCBA7TBFwFAltICtyYUZYtlLnJU2lvJ21IFq5TTyguhhJHBWCUX5OVPew9p0hAPOFybOai5B8kbORhEuA</recordid><startdate>20230622</startdate><enddate>20230622</enddate><creator>Zhang, Yu</creator><creator>Zeng, Hao</creator><creator>Ma, Bowen</creator><creator>Zhang, Wei</creator><creator>Zhang, Zhimeng</creator><creator>Ding, Yu</creator><creator>Lv, Tangjie</creator><creator>Fan, Changjie</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230622</creationdate><title>FlowFace++: Explicit Semantic Flow-supervised End-to-End Face Swapping</title><author>Zhang, Yu ; Zeng, Hao ; Ma, Bowen ; Zhang, Wei ; Zhang, Zhimeng ; Ding, Yu ; Lv, Tangjie ; Fan, Changjie</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-2daa7d858088bfe7f88393c80c6f79ba9121a45ec30b94fc44d54e224263f6c43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Yu</creatorcontrib><creatorcontrib>Zeng, Hao</creatorcontrib><creatorcontrib>Ma, Bowen</creatorcontrib><creatorcontrib>Zhang, Wei</creatorcontrib><creatorcontrib>Zhang, Zhimeng</creatorcontrib><creatorcontrib>Ding, Yu</creatorcontrib><creatorcontrib>Lv, Tangjie</creatorcontrib><creatorcontrib>Fan, Changjie</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Yu</au><au>Zeng, Hao</au><au>Ma, Bowen</au><au>Zhang, Wei</au><au>Zhang, Zhimeng</au><au>Ding, Yu</au><au>Lv, Tangjie</au><au>Fan, Changjie</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>FlowFace++: Explicit Semantic Flow-supervised End-to-End Face Swapping</atitle><date>2023-06-22</date><risdate>2023</risdate><abstract>This work proposes a novel face-swapping framework FlowFace++, utilizing explicit semantic flow supervision and end-to-end architecture to facilitate shape-aware face-swapping. Specifically, our work pretrains a facial shape discriminator to supervise the face swapping network. The discriminator is shape-aware and relies on a semantic flow-guided operation to explicitly calculate the shape discrepancies between the target and source faces, thus optimizing the face swapping network to generate highly realistic results. The face swapping network is a stack of a pre-trained face-masked autoencoder (MAE), a cross-attention fusion module, and a convolutional decoder. The MAE provides a fine-grained facial image representation space, which is unified for the target and source faces and thus facilitates final realistic results. The cross-attention fusion module carries out the source-to-target face swapping in a fine-grained latent space while preserving other attributes of the target image (e.g. expression, head pose, hair, background, illumination, etc). Lastly, the convolutional decoder further synthesizes the swapping results according to the face-swapping latent embedding from the cross-attention fusion module. Extensive quantitative and qualitative experiments on in-the-wild faces demonstrate that our FlowFace++ outperforms the state-of-the-art significantly, particularly while the source face is obstructed by uneven lighting or angle offset.</abstract><doi>10.48550/arxiv.2306.12686</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2306.12686
ispartof
issn
language eng
recordid cdi_arxiv_primary_2306_12686
source arXiv.org
subjects Computer Science - Computer Vision and Pattern Recognition
title FlowFace++: Explicit Semantic Flow-supervised End-to-End Face Swapping
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T13%3A09%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=FlowFace++:%20Explicit%20Semantic%20Flow-supervised%20End-to-End%20Face%20Swapping&rft.au=Zhang,%20Yu&rft.date=2023-06-22&rft_id=info:doi/10.48550/arxiv.2306.12686&rft_dat=%3Carxiv_GOX%3E2306_12686%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true