CIPS-3D++: End-to-End Real-Time High-Resolution 3D-Aware GANs for GAN Inversion and Stylization

Style-based GANs achieve state-of-the-art results for generating high-quality images, but lack explicit and precise control over camera poses. Recently proposed NeRF-based GANs have made great progress towards 3D-aware image generation. However, the methods either rely on convolution operators which...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence 2023-10, Vol.45 (10), p.11502-11520
Hauptverfasser: Zhou, Peng, Xie, Lingxi, Ni, Bingbing, Tian, Qi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 11520
container_issue 10
container_start_page 11502
container_title IEEE transactions on pattern analysis and machine intelligence
container_volume 45
creator Zhou, Peng
Xie, Lingxi
Ni, Bingbing
Tian, Qi
description Style-based GANs achieve state-of-the-art results for generating high-quality images, but lack explicit and precise control over camera poses. Recently proposed NeRF-based GANs have made great progress towards 3D-aware image generation. However, the methods either rely on convolution operators which are not rotationally invariant, or utilize complex yet suboptimal training procedures to integrate both NeRF and CNN sub-structures, yielding un-robust, low-quality images with a large computational burden. This article presents an upgraded version called CIPS-3D++ , aiming at high-robust, high-resolution and high-efficiency 3D-aware GANs. On the one hand, our basic model CIPS-3D, encapsulated in a style-based architecture, features a shallow NeRF-based 3D shape encoder as well as a deep MLP-based 2D image decoder, achieving robust image generation/editing with rotation-invariance. On the other hand, our proposed CIPS-3D++, inheriting the rotational invariance of CIPS-3D, together with geometric regularization and upsampling operations, encourages high-resolution high-quality image generation/editing with great computational efficiency. Trained on raw single-view images, without any bells and whistles, CIPS-3D++ sets new records for 3D-aware image synthesis, with an impressive FID of 3.2 on FFHQ at the 1024\times 1024 1024×1024 resolution. In the meantime, CIPS-3D++ runs efficiently and enjoys a low GPU memory footprint so that it can be trained end-to-end on high-resolution images directly, in contrast to previous alternate/progressive methods. Based on the infrastructure of CIPS-3D++, we propose a 3D-aware GAN inversion algorithm named FlipInversion , which can reconstruct the 3D object from a single-view image. We also provide a 3D-aware stylization method for real images based on CIPS-3D++ and FlipInversion. In addition, we analyze the problem of mirror symmetry suffered in training, and solve it by introducing an auxiliary discriminator for the NeRF network. Overall, CIPS-3D++ provides a strong base model that can serve as a testbed for transferring GAN-based image editing methods from 2D to 3D.
doi_str_mv 10.1109/TPAMI.2023.3285648
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2861454233</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10149489</ieee_id><sourcerecordid>2825810974</sourcerecordid><originalsourceid>FETCH-LOGICAL-c303t-bc5e0de3d7fa2efc3693b66600ddaa240d7be7e341a94e8cd11f9bdd0b185a933</originalsourceid><addsrcrecordid>eNpdkN1O3DAQRi3UChboC6CqisQNEvJie5zE5m61_K1EWwTLteXEkzYom4CdgOjT12G3CPVqRprzfRodQg44m3LO9MnyZvZ9MRVMwBSESjOptsiEa9AUUtCfyITxTFClhNohuyE8MMZlymCb7EAOnCmZTYiZL27uKJwdH58m562jfUfjSG7RNnRZrzC5qn_9prcYumbo665N4IzOXqzH5HL2IyRV58clWbTP6MN4tzF917829R878vvkc2WbgF82c4_cX5wv51f0-uflYj67piUw6GlRpsgcgssrK7AqIdNQZFnGmHPWCslcXmCOILnVElXpOK904RwruEqtBtgjR-veR989DRh6s6pDiU1jW-yGYIQSqYrSchnRw__Qh27wbfwuUllUJAWMhWJNlb4LwWNlHn29sv7VcGZG_eZNvxn1m43-GPq2qR6KFbr3yD_fEfi6BmpE_NDIpZZKw19GVIaF</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2861454233</pqid></control><display><type>article</type><title>CIPS-3D++: End-to-End Real-Time High-Resolution 3D-Aware GANs for GAN Inversion and Stylization</title><source>IEEE Electronic Library (IEL)</source><creator>Zhou, Peng ; Xie, Lingxi ; Ni, Bingbing ; Tian, Qi</creator><creatorcontrib>Zhou, Peng ; Xie, Lingxi ; Ni, Bingbing ; Tian, Qi</creatorcontrib><description><![CDATA[Style-based GANs achieve state-of-the-art results for generating high-quality images, but lack explicit and precise control over camera poses. Recently proposed NeRF-based GANs have made great progress towards 3D-aware image generation. However, the methods either rely on convolution operators which are not rotationally invariant, or utilize complex yet suboptimal training procedures to integrate both NeRF and CNN sub-structures, yielding un-robust, low-quality images with a large computational burden. This article presents an upgraded version called CIPS-3D++ , aiming at high-robust, high-resolution and high-efficiency 3D-aware GANs. On the one hand, our basic model CIPS-3D, encapsulated in a style-based architecture, features a shallow NeRF-based 3D shape encoder as well as a deep MLP-based 2D image decoder, achieving robust image generation/editing with rotation-invariance. On the other hand, our proposed CIPS-3D++, inheriting the rotational invariance of CIPS-3D, together with geometric regularization and upsampling operations, encourages high-resolution high-quality image generation/editing with great computational efficiency. Trained on raw single-view images, without any bells and whistles, CIPS-3D++ sets new records for 3D-aware image synthesis, with an impressive FID of 3.2 on FFHQ at the <inline-formula><tex-math notation="LaTeX">1024\times 1024</tex-math> <mml:math><mml:mrow><mml:mn>1024</mml:mn><mml:mo>×</mml:mo><mml:mn>1024</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq1-3285648.gif"/> </inline-formula> resolution. In the meantime, CIPS-3D++ runs efficiently and enjoys a low GPU memory footprint so that it can be trained end-to-end on high-resolution images directly, in contrast to previous alternate/progressive methods. Based on the infrastructure of CIPS-3D++, we propose a 3D-aware GAN inversion algorithm named FlipInversion , which can reconstruct the 3D object from a single-view image. We also provide a 3D-aware stylization method for real images based on CIPS-3D++ and FlipInversion. In addition, we analyze the problem of mirror symmetry suffered in training, and solve it by introducing an auxiliary discriminator for the NeRF network. Overall, CIPS-3D++ provides a strong base model that can serve as a testbed for transferring GAN-based image editing methods from 2D to 3D.]]></description><identifier>ISSN: 0162-8828</identifier><identifier>EISSN: 1939-3539</identifier><identifier>EISSN: 2160-9292</identifier><identifier>DOI: 10.1109/TPAMI.2023.3285648</identifier><identifier>PMID: 37310846</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>3D-aware ; Algorithms ; Coders ; Editing ; GANs ; Generative adversarial networks ; Generators ; High resolution ; Image contrast ; Image processing ; Image quality ; Image resolution ; Invariance ; inversion ; Mirrors ; Operators (mathematics) ; Regularization ; Semantics ; Shape ; Solid modeling ; stylization ; Three dimensional models ; Three-dimensional displays ; Training</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2023-10, Vol.45 (10), p.11502-11520</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c303t-bc5e0de3d7fa2efc3693b66600ddaa240d7be7e341a94e8cd11f9bdd0b185a933</cites><orcidid>0000-0003-4831-9451 ; 0000-0001-7339-028X ; 0000-0002-0674-9296 ; 0000-0002-7252-5047</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10149489$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10149489$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37310846$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhou, Peng</creatorcontrib><creatorcontrib>Xie, Lingxi</creatorcontrib><creatorcontrib>Ni, Bingbing</creatorcontrib><creatorcontrib>Tian, Qi</creatorcontrib><title>CIPS-3D++: End-to-End Real-Time High-Resolution 3D-Aware GANs for GAN Inversion and Stylization</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><description><![CDATA[Style-based GANs achieve state-of-the-art results for generating high-quality images, but lack explicit and precise control over camera poses. Recently proposed NeRF-based GANs have made great progress towards 3D-aware image generation. However, the methods either rely on convolution operators which are not rotationally invariant, or utilize complex yet suboptimal training procedures to integrate both NeRF and CNN sub-structures, yielding un-robust, low-quality images with a large computational burden. This article presents an upgraded version called CIPS-3D++ , aiming at high-robust, high-resolution and high-efficiency 3D-aware GANs. On the one hand, our basic model CIPS-3D, encapsulated in a style-based architecture, features a shallow NeRF-based 3D shape encoder as well as a deep MLP-based 2D image decoder, achieving robust image generation/editing with rotation-invariance. On the other hand, our proposed CIPS-3D++, inheriting the rotational invariance of CIPS-3D, together with geometric regularization and upsampling operations, encourages high-resolution high-quality image generation/editing with great computational efficiency. Trained on raw single-view images, without any bells and whistles, CIPS-3D++ sets new records for 3D-aware image synthesis, with an impressive FID of 3.2 on FFHQ at the <inline-formula><tex-math notation="LaTeX">1024\times 1024</tex-math> <mml:math><mml:mrow><mml:mn>1024</mml:mn><mml:mo>×</mml:mo><mml:mn>1024</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq1-3285648.gif"/> </inline-formula> resolution. In the meantime, CIPS-3D++ runs efficiently and enjoys a low GPU memory footprint so that it can be trained end-to-end on high-resolution images directly, in contrast to previous alternate/progressive methods. Based on the infrastructure of CIPS-3D++, we propose a 3D-aware GAN inversion algorithm named FlipInversion , which can reconstruct the 3D object from a single-view image. We also provide a 3D-aware stylization method for real images based on CIPS-3D++ and FlipInversion. In addition, we analyze the problem of mirror symmetry suffered in training, and solve it by introducing an auxiliary discriminator for the NeRF network. Overall, CIPS-3D++ provides a strong base model that can serve as a testbed for transferring GAN-based image editing methods from 2D to 3D.]]></description><subject>3D-aware</subject><subject>Algorithms</subject><subject>Coders</subject><subject>Editing</subject><subject>GANs</subject><subject>Generative adversarial networks</subject><subject>Generators</subject><subject>High resolution</subject><subject>Image contrast</subject><subject>Image processing</subject><subject>Image quality</subject><subject>Image resolution</subject><subject>Invariance</subject><subject>inversion</subject><subject>Mirrors</subject><subject>Operators (mathematics)</subject><subject>Regularization</subject><subject>Semantics</subject><subject>Shape</subject><subject>Solid modeling</subject><subject>stylization</subject><subject>Three dimensional models</subject><subject>Three-dimensional displays</subject><subject>Training</subject><issn>0162-8828</issn><issn>1939-3539</issn><issn>2160-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkN1O3DAQRi3UChboC6CqisQNEvJie5zE5m61_K1EWwTLteXEkzYom4CdgOjT12G3CPVqRprzfRodQg44m3LO9MnyZvZ9MRVMwBSESjOptsiEa9AUUtCfyITxTFClhNohuyE8MMZlymCb7EAOnCmZTYiZL27uKJwdH58m562jfUfjSG7RNnRZrzC5qn_9prcYumbo665N4IzOXqzH5HL2IyRV58clWbTP6MN4tzF917829R878vvkc2WbgF82c4_cX5wv51f0-uflYj67piUw6GlRpsgcgssrK7AqIdNQZFnGmHPWCslcXmCOILnVElXpOK904RwruEqtBtgjR-veR989DRh6s6pDiU1jW-yGYIQSqYrSchnRw__Qh27wbfwuUllUJAWMhWJNlb4LwWNlHn29sv7VcGZG_eZNvxn1m43-GPq2qR6KFbr3yD_fEfi6BmpE_NDIpZZKw19GVIaF</recordid><startdate>20231001</startdate><enddate>20231001</enddate><creator>Zhou, Peng</creator><creator>Xie, Lingxi</creator><creator>Ni, Bingbing</creator><creator>Tian, Qi</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-4831-9451</orcidid><orcidid>https://orcid.org/0000-0001-7339-028X</orcidid><orcidid>https://orcid.org/0000-0002-0674-9296</orcidid><orcidid>https://orcid.org/0000-0002-7252-5047</orcidid></search><sort><creationdate>20231001</creationdate><title>CIPS-3D++: End-to-End Real-Time High-Resolution 3D-Aware GANs for GAN Inversion and Stylization</title><author>Zhou, Peng ; Xie, Lingxi ; Ni, Bingbing ; Tian, Qi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c303t-bc5e0de3d7fa2efc3693b66600ddaa240d7be7e341a94e8cd11f9bdd0b185a933</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>3D-aware</topic><topic>Algorithms</topic><topic>Coders</topic><topic>Editing</topic><topic>GANs</topic><topic>Generative adversarial networks</topic><topic>Generators</topic><topic>High resolution</topic><topic>Image contrast</topic><topic>Image processing</topic><topic>Image quality</topic><topic>Image resolution</topic><topic>Invariance</topic><topic>inversion</topic><topic>Mirrors</topic><topic>Operators (mathematics)</topic><topic>Regularization</topic><topic>Semantics</topic><topic>Shape</topic><topic>Solid modeling</topic><topic>stylization</topic><topic>Three dimensional models</topic><topic>Three-dimensional displays</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhou, Peng</creatorcontrib><creatorcontrib>Xie, Lingxi</creatorcontrib><creatorcontrib>Ni, Bingbing</creatorcontrib><creatorcontrib>Tian, Qi</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhou, Peng</au><au>Xie, Lingxi</au><au>Ni, Bingbing</au><au>Tian, Qi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CIPS-3D++: End-to-End Real-Time High-Resolution 3D-Aware GANs for GAN Inversion and Stylization</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><date>2023-10-01</date><risdate>2023</risdate><volume>45</volume><issue>10</issue><spage>11502</spage><epage>11520</epage><pages>11502-11520</pages><issn>0162-8828</issn><eissn>1939-3539</eissn><eissn>2160-9292</eissn><coden>ITPIDJ</coden><abstract><![CDATA[Style-based GANs achieve state-of-the-art results for generating high-quality images, but lack explicit and precise control over camera poses. Recently proposed NeRF-based GANs have made great progress towards 3D-aware image generation. However, the methods either rely on convolution operators which are not rotationally invariant, or utilize complex yet suboptimal training procedures to integrate both NeRF and CNN sub-structures, yielding un-robust, low-quality images with a large computational burden. This article presents an upgraded version called CIPS-3D++ , aiming at high-robust, high-resolution and high-efficiency 3D-aware GANs. On the one hand, our basic model CIPS-3D, encapsulated in a style-based architecture, features a shallow NeRF-based 3D shape encoder as well as a deep MLP-based 2D image decoder, achieving robust image generation/editing with rotation-invariance. On the other hand, our proposed CIPS-3D++, inheriting the rotational invariance of CIPS-3D, together with geometric regularization and upsampling operations, encourages high-resolution high-quality image generation/editing with great computational efficiency. Trained on raw single-view images, without any bells and whistles, CIPS-3D++ sets new records for 3D-aware image synthesis, with an impressive FID of 3.2 on FFHQ at the <inline-formula><tex-math notation="LaTeX">1024\times 1024</tex-math> <mml:math><mml:mrow><mml:mn>1024</mml:mn><mml:mo>×</mml:mo><mml:mn>1024</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq1-3285648.gif"/> </inline-formula> resolution. In the meantime, CIPS-3D++ runs efficiently and enjoys a low GPU memory footprint so that it can be trained end-to-end on high-resolution images directly, in contrast to previous alternate/progressive methods. Based on the infrastructure of CIPS-3D++, we propose a 3D-aware GAN inversion algorithm named FlipInversion , which can reconstruct the 3D object from a single-view image. We also provide a 3D-aware stylization method for real images based on CIPS-3D++ and FlipInversion. In addition, we analyze the problem of mirror symmetry suffered in training, and solve it by introducing an auxiliary discriminator for the NeRF network. Overall, CIPS-3D++ provides a strong base model that can serve as a testbed for transferring GAN-based image editing methods from 2D to 3D.]]></abstract><cop>United States</cop><pub>IEEE</pub><pmid>37310846</pmid><doi>10.1109/TPAMI.2023.3285648</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0003-4831-9451</orcidid><orcidid>https://orcid.org/0000-0001-7339-028X</orcidid><orcidid>https://orcid.org/0000-0002-0674-9296</orcidid><orcidid>https://orcid.org/0000-0002-7252-5047</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0162-8828
ispartof IEEE transactions on pattern analysis and machine intelligence, 2023-10, Vol.45 (10), p.11502-11520
issn 0162-8828
1939-3539
2160-9292
language eng
recordid cdi_proquest_journals_2861454233
source IEEE Electronic Library (IEL)
subjects 3D-aware
Algorithms
Coders
Editing
GANs
Generative adversarial networks
Generators
High resolution
Image contrast
Image processing
Image quality
Image resolution
Invariance
inversion
Mirrors
Operators (mathematics)
Regularization
Semantics
Shape
Solid modeling
stylization
Three dimensional models
Three-dimensional displays
Training
title CIPS-3D++: End-to-End Real-Time High-Resolution 3D-Aware GANs for GAN Inversion and Stylization
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T12%3A45%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CIPS-3D++:%20End-to-End%20Real-Time%20High-Resolution%203D-Aware%20GANs%20for%20GAN%20Inversion%20and%20Stylization&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Zhou,%20Peng&rft.date=2023-10-01&rft.volume=45&rft.issue=10&rft.spage=11502&rft.epage=11520&rft.pages=11502-11520&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2023.3285648&rft_dat=%3Cproquest_RIE%3E2825810974%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2861454233&rft_id=info:pmid/37310846&rft_ieee_id=10149489&rfr_iscdi=true