CIPS-3D++: End-to-End Real-Time High-Resolution 3D-Aware GANs for GAN Inversion and Stylization

Style-based GANs achieve state-of-the-art results for generating high-quality images, but lack explicit and precise control over camera poses. Recently proposed NeRF-based GANs have made great progress towards 3D-aware image generation. However, the methods either rely on convolution operators which...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence 2023-10, Vol.45 (10), p.11502-11520
Hauptverfasser:	Zhou, Peng, Xie, Lingxi, Ni, Bingbing, Tian, Qi
Format:	Artikel
Sprache:	eng
Schlagworte:	3D-aware Algorithms Coders Editing GANs Generative adversarial networks Generators High resolution Image contrast Image processing Image quality Image resolution Invariance inversion Mirrors Operators (mathematics) Regularization Semantics Shape Solid modeling stylization Three dimensional models Three-dimensional displays Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	11520
container_issue	10
container_start_page	11502
container_title	IEEE transactions on pattern analysis and machine intelligence
container_volume	45
creator	Zhou, Peng Xie, Lingxi Ni, Bingbing Tian, Qi
description	Style-based GANs achieve state-of-the-art results for generating high-quality images, but lack explicit and precise control over camera poses. Recently proposed NeRF-based GANs have made great progress towards 3D-aware image generation. However, the methods either rely on convolution operators which are not rotationally invariant, or utilize complex yet suboptimal training procedures to integrate both NeRF and CNN sub-structures, yielding un-robust, low-quality images with a large computational burden. This article presents an upgraded version called CIPS-3D++ , aiming at high-robust, high-resolution and high-efficiency 3D-aware GANs. On the one hand, our basic model CIPS-3D, encapsulated in a style-based architecture, features a shallow NeRF-based 3D shape encoder as well as a deep MLP-based 2D image decoder, achieving robust image generation/editing with rotation-invariance. On the other hand, our proposed CIPS-3D++, inheriting the rotational invariance of CIPS-3D, together with geometric regularization and upsampling operations, encourages high-resolution high-quality image generation/editing with great computational efficiency. Trained on raw single-view images, without any bells and whistles, CIPS-3D++ sets new records for 3D-aware image synthesis, with an impressive FID of 3.2 on FFHQ at the 1024\times 1024 1024×1024 resolution. In the meantime, CIPS-3D++ runs efficiently and enjoys a low GPU memory footprint so that it can be trained end-to-end on high-resolution images directly, in contrast to previous alternate/progressive methods. Based on the infrastructure of CIPS-3D++, we propose a 3D-aware GAN inversion algorithm named FlipInversion , which can reconstruct the 3D object from a single-view image. We also provide a 3D-aware stylization method for real images based on CIPS-3D++ and FlipInversion. In addition, we analyze the problem of mirror symmetry suffered in training, and solve it by introducing an auxiliary discriminator for the NeRF network. Overall, CIPS-3D++ provides a strong base model that can serve as a testbed for transferring GAN-based image editing methods from 2D to 3D.
doi_str_mv	10.1109/TPAMI.2023.3285648
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2861454233</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10149489</ieee_id><sourcerecordid>2825810974</sourcerecordid><originalsourceid>FETCH-LOGICAL-c303t-bc5e0de3d7fa2efc3693b66600ddaa240d7be7e341a94e8cd11f9bdd0b185a933</originalsourceid><addsrcrecordid>eNpdkN1O3DAQRi3UChboC6CqisQNEvJie5zE5m61_K1EWwTLteXEkzYom4CdgOjT12G3CPVqRprzfRodQg44m3LO9MnyZvZ9MRVMwBSESjOptsiEa9AUUtCfyITxTFClhNohuyE8MMZlymCb7EAOnCmZTYiZL27uKJwdH58m562jfUfjSG7RNnRZrzC5qn_9prcYumbo665N4IzOXqzH5HL2IyRV58clWbTP6MN4tzF917829R878vvkc2WbgF82c4_cX5wv51f0-uflYj67piUw6GlRpsgcgssrK7AqIdNQZFnGmHPWCslcXmCOILnVElXpOK904RwruEqtBtgjR-veR989DRh6s6pDiU1jW-yGYIQSqYrSchnRw__Qh27wbfwuUllUJAWMhWJNlb4LwWNlHn29sv7VcGZG_eZNvxn1m43-GPq2qR6KFbr3yD_fEfi6BmpE_NDIpZZKw19GVIaF</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2861454233</pqid></control><display><type>article</type><title>CIPS-3D++: End-to-End Real-Time High-Resolution 3D-Aware GANs for GAN Inversion and Stylization</title><source>IEEE Electronic Library (IEL)</source><creator>Zhou, Peng ; Xie, Lingxi ; Ni, Bingbing ; Tian, Qi</creator><creatorcontrib>Zhou, Peng ; Xie, Lingxi ; Ni, Bingbing ; Tian, Qi</creatorcontrib><description><![CDATA[Style-based GANs achieve state-of-the-art results for generating high-quality images, but lack explicit and precise control over camera poses. Recently proposed NeRF-based GANs have made great progress towards 3D-aware image generation. However, the methods either rely on convolution operators which are not rotationally invariant, or utilize complex yet suboptimal training procedures to integrate both NeRF and CNN sub-structures, yielding un-robust, low-quality images with a large computational burden. This article presents an upgraded version called CIPS-3D++ , aiming at high-robust, high-resolution and high-efficiency 3D-aware GANs. On the one hand, our basic model CIPS-3D, encapsulated in a style-based architecture, features a shallow NeRF-based 3D shape encoder as well as a deep MLP-based 2D image decoder, achieving robust image generation/editing with rotation-invariance. On the other hand, our proposed CIPS-3D++, inheriting the rotational invariance of CIPS-3D, together with geometric regularization and upsampling operations, encourages high-resolution high-quality image generation/editing with great computational efficiency. Trained on raw single-view images, without any bells and whistles, CIPS-3D++ sets new records for 3D-aware image synthesis, with an impressive FID of 3.2 on FFHQ at the <inline-formula><tex-math notation="LaTeX">1024\times 1024</tex-math> <mml:math><mml:mrow><mml:mn>1024</mml:mn><mml:mo>×</mml:mo><mml:mn>1024</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq1-3285648.gif"/> </inline-formula> resolution. In the meantime, CIPS-3D++ runs efficiently and enjoys a low GPU memory footprint so that it can be trained end-to-end on high-resolution images directly, in contrast to previous alternate/progressive methods. Based on the infrastructure of CIPS-3D++, we propose a 3D-aware GAN inversion algorithm named FlipInversion , which can reconstruct the 3D object from a single-view image. We also provide a 3D-aware stylization method for real images based on CIPS-3D++ and FlipInversion. In addition, we analyze the problem of mirror symmetry suffered in training, and solve it by introducing an auxiliary discriminator for the NeRF network. Overall, CIPS-3D++ provides a strong base model that can serve as a testbed for transferring GAN-based image editing methods from 2D to 3D.]]></description><identifier>ISSN: 0162-8828</identifier><identifier>EISSN: 1939-3539</identifier><identifier>EISSN: 2160-9292</identifier><identifier>DOI: 10.1109/TPAMI.2023.3285648</identifier><identifier>PMID: 37310846</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>3D-aware ; Algorithms ; Coders ; Editing ; GANs ; Generative adversarial networks ; Generators ; High resolution ; Image contrast ; Image processing ; Image quality ; Image resolution ; Invariance ; inversion ; Mirrors ; Operators (mathematics) ; Regularization ; Semantics ; Shape ; Solid modeling ; stylization ; Three dimensional models ; Three-dimensional displays ; Training</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2023-10, Vol.45 (10), p.11502-11520</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c303t-bc5e0de3d7fa2efc3693b66600ddaa240d7be7e341a94e8cd11f9bdd0b185a933</cites><orcidid>0000-0003-4831-9451 ; 0000-0001-7339-028X ; 0000-0002-0674-9296 ; 0000-0002-7252-5047</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10149489$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10149489$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37310846$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhou, Peng</creatorcontrib><creatorcontrib>Xie, Lingxi</creatorcontrib><creatorcontrib>Ni, Bingbing</creatorcontrib><creatorcontrib>Tian, Qi</creatorcontrib><title>CIPS-3D++: End-to-End Real-Time High-Resolution 3D-Aware GANs for GAN Inversion and Stylization</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><description><![CDATA[Style-based GANs achieve state-of-the-art results for generating high-quality images, but lack explicit and precise control over camera poses. Recently proposed NeRF-based GANs have made great progress towards 3D-aware image generation. However, the methods either rely on convolution operators which are not rotationally invariant, or utilize complex yet suboptimal training procedures to integrate both NeRF and CNN sub-structures, yielding un-robust, low-quality images with a large computational burden. This article presents an upgraded version called CIPS-3D++ , aiming at high-robust, high-resolution and high-efficiency 3D-aware GANs. On the one hand, our basic model CIPS-3D, encapsulated in a style-based architecture, features a shallow NeRF-based 3D shape encoder as well as a deep MLP-based 2D image decoder, achieving robust image generation/editing with rotation-invariance. On the other hand, our proposed CIPS-3D++, inheriting the rotational invariance of CIPS-3D, together with geometric regularization and upsampling operations, encourages high-resolution high-quality image generation/editing with great computational efficiency. Trained on raw single-view images, without any bells and whistles, CIPS-3D++ sets new records for 3D-aware image synthesis, with an impressive FID of 3.2 on FFHQ at the <inline-formula><tex-math notation="LaTeX">1024\times 1024</tex-math> <mml:math><mml:mrow><mml:mn>1024</mml:mn><mml:mo>×</mml:mo><mml:mn>1024</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq1-3285648.gif"/> </inline-formula> resolution. In the meantime, CIPS-3D++ runs efficiently and enjoys a low GPU memory footprint so that it can be trained end-to-end on high-resolution images directly, in contrast to previous alternate/progressive methods. Based on the infrastructure of CIPS-3D++, we propose a 3D-aware GAN inversion algorithm named FlipInversion , which can reconstruct the 3D object from a single-view image. We also provide a 3D-aware stylization method for real images based on CIPS-3D++ and FlipInversion. In addition, we analyze the problem of mirror symmetry suffered in training, and solve it by introducing an auxiliary discriminator for the NeRF network. Overall, CIPS-3D++ provides a strong base model that can serve as a testbed for transferring GAN-based image editing methods from 2D to 3D.]]></description><subject>3D-aware</subject><subject>Algorithms</subject><subject>Coders</subject><subject>Editing</subject><subject>GANs</subject><subject>Generative adversarial networks</subject><subject>Generators</subject><subject>High resolution</subject><subject>Image contrast</subject><subject>Image processing</subject><subject>Image quality</subject><subject>Image resolution</subject><subject>Invariance</subject><subject>inversion</subject><subject>Mirrors</subject><subject>Operators (mathematics)</subject><subject>Regularization</subject><subject>Semantics</subject><subject>Shape</subject><subject>Solid modeling</subject><subject>stylization</subject><subject>Three dimensional models</subject><subject>Three-dimensional displays</subject><subject>Training</subject><issn>0162-8828</issn><issn>1939-3539</issn><issn>2160-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkN1O3DAQRi3UChboC6CqisQNEvJie5zE5m61_K1EWwTLteXEkzYom4CdgOjT12G3CPVqRprzfRodQg44m3LO9MnyZvZ9MRVMwBSESjOptsiEa9AUUtCfyITxTFClhNohuyE8MMZlymCb7EAOnCmZTYiZL27uKJwdH58m562jfUfjSG7RNnRZrzC5qn_9prcYumbo665N4IzOXqzH5HL2IyRV58clWbTP6MN4tzF917829R878vvkc2WbgF82c4_cX5wv51f0-uflYj67piUw6GlRpsgcgssrK7AqIdNQZFnGmHPWCslcXmCOILnVElXpOK904RwruEqtBtgjR-veR989DRh6s6pDiU1jW-yGYIQSqYrSchnRw__Qh27wbfwuUllUJAWMhWJNlb4LwWNlHn29sv7VcGZG_eZNvxn1m43-GPq2qR6KFbr3yD_fEfi6BmpE_NDIpZZKw19GVIaF</recordid><startdate>20231001</startdate><enddate>20231001</enddate><creator>Zhou, Peng</creator><creator>Xie, Lingxi</creator><creator>Ni, Bingbing</creator><creator>Tian, Qi</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-4831-9451</orcidid><orcidid>https://orcid.org/0000-0001-7339-028X</orcidid><orcidid>https://orcid.org/0000-0002-0674-9296</orcidid><orcidid>https://orcid.org/0000-0002-7252-5047</orcidid></search><sort><creationdate>20231001</creationdate><title>CIPS-3D++: End-to-End Real-Time High-Resolution 3D-Aware GANs for GAN Inversion and Stylization</title><author>Zhou, Peng ; Xie, Lingxi ; Ni, Bingbing ; Tian, Qi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c303t-bc5e0de3d7fa2efc3693b66600ddaa240d7be7e341a94e8cd11f9bdd0b185a933</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>3D-aware</topic><topic>Algorithms</topic><topic>Coders</topic><topic>Editing</topic><topic>GANs</topic><topic>Generative adversarial networks</topic><topic>Generators</topic><topic>High resolution</topic><topic>Image contrast</topic><topic>Image processing</topic><topic>Image quality</topic><topic>Image resolution</topic><topic>Invariance</topic><topic>inversion</topic><topic>Mirrors</topic><topic>Operators (mathematics)</topic><topic>Regularization</topic><topic>Semantics</topic><topic>Shape</topic><topic>Solid modeling</topic><topic>stylization</topic><topic>Three dimensional models</topic><topic>Three-dimensional displays</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhou, Peng</creatorcontrib><creatorcontrib>Xie, Lingxi</creatorcontrib><creatorcontrib>Ni, Bingbing</creatorcontrib><creatorcontrib>Tian, Qi</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhou, Peng</au><au>Xie, Lingxi</au><au>Ni, Bingbing</au><au>Tian, Qi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CIPS-3D++: End-to-End Real-Time High-Resolution 3D-Aware GANs for GAN Inversion and Stylization</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><date>2023-10-01</date><risdate>2023</risdate><volume>45</volume><issue>10</issue><spage>11502</spage><epage>11520</epage><pages>11502-11520</pages><issn>0162-8828</issn><eissn>1939-3539</eissn><eissn>2160-9292</eissn><coden>ITPIDJ</coden><abstract><![CDATA[Style-based GANs achieve state-of-the-art results for generating high-quality images, but lack explicit and precise control over camera poses. Recently proposed NeRF-based GANs have made great progress towards 3D-aware image generation. However, the methods either rely on convolution operators which are not rotationally invariant, or utilize complex yet suboptimal training procedures to integrate both NeRF and CNN sub-structures, yielding un-robust, low-quality images with a large computational burden. This article presents an upgraded version called CIPS-3D++ , aiming at high-robust, high-resolution and high-efficiency 3D-aware GANs. On the one hand, our basic model CIPS-3D, encapsulated in a style-based architecture, features a shallow NeRF-based 3D shape encoder as well as a deep MLP-based 2D image decoder, achieving robust image generation/editing with rotation-invariance. On the other hand, our proposed CIPS-3D++, inheriting the rotational invariance of CIPS-3D, together with geometric regularization and upsampling operations, encourages high-resolution high-quality image generation/editing with great computational efficiency. Trained on raw single-view images, without any bells and whistles, CIPS-3D++ sets new records for 3D-aware image synthesis, with an impressive FID of 3.2 on FFHQ at the <inline-formula><tex-math notation="LaTeX">1024\times 1024</tex-math> <mml:math><mml:mrow><mml:mn>1024</mml:mn><mml:mo>×</mml:mo><mml:mn>1024</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq1-3285648.gif"/> </inline-formula> resolution. In the meantime, CIPS-3D++ runs efficiently and enjoys a low GPU memory footprint so that it can be trained end-to-end on high-resolution images directly, in contrast to previous alternate/progressive methods. Based on the infrastructure of CIPS-3D++, we propose a 3D-aware GAN inversion algorithm named FlipInversion , which can reconstruct the 3D object from a single-view image. We also provide a 3D-aware stylization method for real images based on CIPS-3D++ and FlipInversion. In addition, we analyze the problem of mirror symmetry suffered in training, and solve it by introducing an auxiliary discriminator for the NeRF network. Overall, CIPS-3D++ provides a strong base model that can serve as a testbed for transferring GAN-based image editing methods from 2D to 3D.]]></abstract><cop>United States</cop><pub>IEEE</pub><pmid>37310846</pmid><doi>10.1109/TPAMI.2023.3285648</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0003-4831-9451</orcidid><orcidid>https://orcid.org/0000-0001-7339-028X</orcidid><orcidid>https://orcid.org/0000-0002-0674-9296</orcidid><orcidid>https://orcid.org/0000-0002-7252-5047</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0162-8828
ispartof	IEEE transactions on pattern analysis and machine intelligence, 2023-10, Vol.45 (10), p.11502-11520
issn	0162-8828 1939-3539 2160-9292
language	eng
recordid	cdi_proquest_journals_2861454233
source	IEEE Electronic Library (IEL)
subjects	3D-aware Algorithms Coders Editing GANs Generative adversarial networks Generators High resolution Image contrast Image processing Image quality Image resolution Invariance inversion Mirrors Operators (mathematics) Regularization Semantics Shape Solid modeling stylization Three dimensional models Three-dimensional displays Training
title	CIPS-3D++: End-to-End Real-Time High-Resolution 3D-Aware GANs for GAN Inversion and Stylization
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T12%3A45%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CIPS-3D++:%20End-to-End%20Real-Time%20High-Resolution%203D-Aware%20GANs%20for%20GAN%20Inversion%20and%20Stylization&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Zhou,%20Peng&rft.date=2023-10-01&rft.volume=45&rft.issue=10&rft.spage=11502&rft.epage=11520&rft.pages=11502-11520&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2023.3285648&rft_dat=%3Cproquest_RIE%3E2825810974%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2861454233&rft_id=info:pmid/37310846&rft_ieee_id=10149489&rfr_iscdi=true