CIPS-3D++: End-to-End Real-Time High-Resolution 3D-Aware GANs for GAN Inversion and Stylization
Style-based GANs achieve state-of-the-art results for generating high-quality images, but lack explicit and precise control over camera poses. Recently proposed NeRF-based GANs have made great progress towards 3D-aware image generation. However, the methods either rely on convolution operators which...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence 2023-10, Vol.45 (10), p.11502-11520 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 11520 |
---|---|
container_issue | 10 |
container_start_page | 11502 |
container_title | IEEE transactions on pattern analysis and machine intelligence |
container_volume | 45 |
creator | Zhou, Peng Xie, Lingxi Ni, Bingbing Tian, Qi |
description | Style-based GANs achieve state-of-the-art results for generating high-quality images, but lack explicit and precise control over camera poses. Recently proposed NeRF-based GANs have made great progress towards 3D-aware image generation. However, the methods either rely on convolution operators which are not rotationally invariant, or utilize complex yet suboptimal training procedures to integrate both NeRF and CNN sub-structures, yielding un-robust, low-quality images with a large computational burden. This article presents an upgraded version called CIPS-3D++ , aiming at high-robust, high-resolution and high-efficiency 3D-aware GANs. On the one hand, our basic model CIPS-3D, encapsulated in a style-based architecture, features a shallow NeRF-based 3D shape encoder as well as a deep MLP-based 2D image decoder, achieving robust image generation/editing with rotation-invariance. On the other hand, our proposed CIPS-3D++, inheriting the rotational invariance of CIPS-3D, together with geometric regularization and upsampling operations, encourages high-resolution high-quality image generation/editing with great computational efficiency. Trained on raw single-view images, without any bells and whistles, CIPS-3D++ sets new records for 3D-aware image synthesis, with an impressive FID of 3.2 on FFHQ at the 1024\times 1024 1024×1024 resolution. In the meantime, CIPS-3D++ runs efficiently and enjoys a low GPU memory footprint so that it can be trained end-to-end on high-resolution images directly, in contrast to previous alternate/progressive methods. Based on the infrastructure of CIPS-3D++, we propose a 3D-aware GAN inversion algorithm named FlipInversion , which can reconstruct the 3D object from a single-view image. We also provide a 3D-aware stylization method for real images based on CIPS-3D++ and FlipInversion. In addition, we analyze the problem of mirror symmetry suffered in training, and solve it by introducing an auxiliary discriminator for the NeRF network. Overall, CIPS-3D++ provides a strong base model that can serve as a testbed for transferring GAN-based image editing methods from 2D to 3D. |
doi_str_mv | 10.1109/TPAMI.2023.3285648 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2861454233</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10149489</ieee_id><sourcerecordid>2825810974</sourcerecordid><originalsourceid>FETCH-LOGICAL-c303t-bc5e0de3d7fa2efc3693b66600ddaa240d7be7e341a94e8cd11f9bdd0b185a933</originalsourceid><addsrcrecordid>eNpdkN1O3DAQRi3UChboC6CqisQNEvJie5zE5m61_K1EWwTLteXEkzYom4CdgOjT12G3CPVqRprzfRodQg44m3LO9MnyZvZ9MRVMwBSESjOptsiEa9AUUtCfyITxTFClhNohuyE8MMZlymCb7EAOnCmZTYiZL27uKJwdH58m562jfUfjSG7RNnRZrzC5qn_9prcYumbo665N4IzOXqzH5HL2IyRV58clWbTP6MN4tzF917829R878vvkc2WbgF82c4_cX5wv51f0-uflYj67piUw6GlRpsgcgssrK7AqIdNQZFnGmHPWCslcXmCOILnVElXpOK904RwruEqtBtgjR-veR989DRh6s6pDiU1jW-yGYIQSqYrSchnRw__Qh27wbfwuUllUJAWMhWJNlb4LwWNlHn29sv7VcGZG_eZNvxn1m43-GPq2qR6KFbr3yD_fEfi6BmpE_NDIpZZKw19GVIaF</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2861454233</pqid></control><display><type>article</type><title>CIPS-3D++: End-to-End Real-Time High-Resolution 3D-Aware GANs for GAN Inversion and Stylization</title><source>IEEE Electronic Library (IEL)</source><creator>Zhou, Peng ; Xie, Lingxi ; Ni, Bingbing ; Tian, Qi</creator><creatorcontrib>Zhou, Peng ; Xie, Lingxi ; Ni, Bingbing ; Tian, Qi</creatorcontrib><description><![CDATA[Style-based GANs achieve state-of-the-art results for generating high-quality images, but lack explicit and precise control over camera poses. Recently proposed NeRF-based GANs have made great progress towards 3D-aware image generation. However, the methods either rely on convolution operators which are not rotationally invariant, or utilize complex yet suboptimal training procedures to integrate both NeRF and CNN sub-structures, yielding un-robust, low-quality images with a large computational burden. This article presents an upgraded version called CIPS-3D++ , aiming at high-robust, high-resolution and high-efficiency 3D-aware GANs. On the one hand, our basic model CIPS-3D, encapsulated in a style-based architecture, features a shallow NeRF-based 3D shape encoder as well as a deep MLP-based 2D image decoder, achieving robust image generation/editing with rotation-invariance. On the other hand, our proposed CIPS-3D++, inheriting the rotational invariance of CIPS-3D, together with geometric regularization and upsampling operations, encourages high-resolution high-quality image generation/editing with great computational efficiency. Trained on raw single-view images, without any bells and whistles, CIPS-3D++ sets new records for 3D-aware image synthesis, with an impressive FID of 3.2 on FFHQ at the <inline-formula><tex-math notation="LaTeX">1024\times 1024</tex-math> <mml:math><mml:mrow><mml:mn>1024</mml:mn><mml:mo>×</mml:mo><mml:mn>1024</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq1-3285648.gif"/> </inline-formula> resolution. In the meantime, CIPS-3D++ runs efficiently and enjoys a low GPU memory footprint so that it can be trained end-to-end on high-resolution images directly, in contrast to previous alternate/progressive methods. Based on the infrastructure of CIPS-3D++, we propose a 3D-aware GAN inversion algorithm named FlipInversion , which can reconstruct the 3D object from a single-view image. We also provide a 3D-aware stylization method for real images based on CIPS-3D++ and FlipInversion. In addition, we analyze the problem of mirror symmetry suffered in training, and solve it by introducing an auxiliary discriminator for the NeRF network. Overall, CIPS-3D++ provides a strong base model that can serve as a testbed for transferring GAN-based image editing methods from 2D to 3D.]]></description><identifier>ISSN: 0162-8828</identifier><identifier>EISSN: 1939-3539</identifier><identifier>EISSN: 2160-9292</identifier><identifier>DOI: 10.1109/TPAMI.2023.3285648</identifier><identifier>PMID: 37310846</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>3D-aware ; Algorithms ; Coders ; Editing ; GANs ; Generative adversarial networks ; Generators ; High resolution ; Image contrast ; Image processing ; Image quality ; Image resolution ; Invariance ; inversion ; Mirrors ; Operators (mathematics) ; Regularization ; Semantics ; Shape ; Solid modeling ; stylization ; Three dimensional models ; Three-dimensional displays ; Training</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2023-10, Vol.45 (10), p.11502-11520</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c303t-bc5e0de3d7fa2efc3693b66600ddaa240d7be7e341a94e8cd11f9bdd0b185a933</cites><orcidid>0000-0003-4831-9451 ; 0000-0001-7339-028X ; 0000-0002-0674-9296 ; 0000-0002-7252-5047</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10149489$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10149489$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37310846$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhou, Peng</creatorcontrib><creatorcontrib>Xie, Lingxi</creatorcontrib><creatorcontrib>Ni, Bingbing</creatorcontrib><creatorcontrib>Tian, Qi</creatorcontrib><title>CIPS-3D++: End-to-End Real-Time High-Resolution 3D-Aware GANs for GAN Inversion and Stylization</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><description><![CDATA[Style-based GANs achieve state-of-the-art results for generating high-quality images, but lack explicit and precise control over camera poses. Recently proposed NeRF-based GANs have made great progress towards 3D-aware image generation. However, the methods either rely on convolution operators which are not rotationally invariant, or utilize complex yet suboptimal training procedures to integrate both NeRF and CNN sub-structures, yielding un-robust, low-quality images with a large computational burden. This article presents an upgraded version called CIPS-3D++ , aiming at high-robust, high-resolution and high-efficiency 3D-aware GANs. On the one hand, our basic model CIPS-3D, encapsulated in a style-based architecture, features a shallow NeRF-based 3D shape encoder as well as a deep MLP-based 2D image decoder, achieving robust image generation/editing with rotation-invariance. On the other hand, our proposed CIPS-3D++, inheriting the rotational invariance of CIPS-3D, together with geometric regularization and upsampling operations, encourages high-resolution high-quality image generation/editing with great computational efficiency. Trained on raw single-view images, without any bells and whistles, CIPS-3D++ sets new records for 3D-aware image synthesis, with an impressive FID of 3.2 on FFHQ at the <inline-formula><tex-math notation="LaTeX">1024\times 1024</tex-math> <mml:math><mml:mrow><mml:mn>1024</mml:mn><mml:mo>×</mml:mo><mml:mn>1024</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq1-3285648.gif"/> </inline-formula> resolution. In the meantime, CIPS-3D++ runs efficiently and enjoys a low GPU memory footprint so that it can be trained end-to-end on high-resolution images directly, in contrast to previous alternate/progressive methods. Based on the infrastructure of CIPS-3D++, we propose a 3D-aware GAN inversion algorithm named FlipInversion , which can reconstruct the 3D object from a single-view image. We also provide a 3D-aware stylization method for real images based on CIPS-3D++ and FlipInversion. In addition, we analyze the problem of mirror symmetry suffered in training, and solve it by introducing an auxiliary discriminator for the NeRF network. Overall, CIPS-3D++ provides a strong base model that can serve as a testbed for transferring GAN-based image editing methods from 2D to 3D.]]></description><subject>3D-aware</subject><subject>Algorithms</subject><subject>Coders</subject><subject>Editing</subject><subject>GANs</subject><subject>Generative adversarial networks</subject><subject>Generators</subject><subject>High resolution</subject><subject>Image contrast</subject><subject>Image processing</subject><subject>Image quality</subject><subject>Image resolution</subject><subject>Invariance</subject><subject>inversion</subject><subject>Mirrors</subject><subject>Operators (mathematics)</subject><subject>Regularization</subject><subject>Semantics</subject><subject>Shape</subject><subject>Solid modeling</subject><subject>stylization</subject><subject>Three dimensional models</subject><subject>Three-dimensional displays</subject><subject>Training</subject><issn>0162-8828</issn><issn>1939-3539</issn><issn>2160-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkN1O3DAQRi3UChboC6CqisQNEvJie5zE5m61_K1EWwTLteXEkzYom4CdgOjT12G3CPVqRprzfRodQg44m3LO9MnyZvZ9MRVMwBSESjOptsiEa9AUUtCfyITxTFClhNohuyE8MMZlymCb7EAOnCmZTYiZL27uKJwdH58m562jfUfjSG7RNnRZrzC5qn_9prcYumbo665N4IzOXqzH5HL2IyRV58clWbTP6MN4tzF917829R878vvkc2WbgF82c4_cX5wv51f0-uflYj67piUw6GlRpsgcgssrK7AqIdNQZFnGmHPWCslcXmCOILnVElXpOK904RwruEqtBtgjR-veR989DRh6s6pDiU1jW-yGYIQSqYrSchnRw__Qh27wbfwuUllUJAWMhWJNlb4LwWNlHn29sv7VcGZG_eZNvxn1m43-GPq2qR6KFbr3yD_fEfi6BmpE_NDIpZZKw19GVIaF</recordid><startdate>20231001</startdate><enddate>20231001</enddate><creator>Zhou, Peng</creator><creator>Xie, Lingxi</creator><creator>Ni, Bingbing</creator><creator>Tian, Qi</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-4831-9451</orcidid><orcidid>https://orcid.org/0000-0001-7339-028X</orcidid><orcidid>https://orcid.org/0000-0002-0674-9296</orcidid><orcidid>https://orcid.org/0000-0002-7252-5047</orcidid></search><sort><creationdate>20231001</creationdate><title>CIPS-3D++: End-to-End Real-Time High-Resolution 3D-Aware GANs for GAN Inversion and Stylization</title><author>Zhou, Peng ; Xie, Lingxi ; Ni, Bingbing ; Tian, Qi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c303t-bc5e0de3d7fa2efc3693b66600ddaa240d7be7e341a94e8cd11f9bdd0b185a933</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>3D-aware</topic><topic>Algorithms</topic><topic>Coders</topic><topic>Editing</topic><topic>GANs</topic><topic>Generative adversarial networks</topic><topic>Generators</topic><topic>High resolution</topic><topic>Image contrast</topic><topic>Image processing</topic><topic>Image quality</topic><topic>Image resolution</topic><topic>Invariance</topic><topic>inversion</topic><topic>Mirrors</topic><topic>Operators (mathematics)</topic><topic>Regularization</topic><topic>Semantics</topic><topic>Shape</topic><topic>Solid modeling</topic><topic>stylization</topic><topic>Three dimensional models</topic><topic>Three-dimensional displays</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhou, Peng</creatorcontrib><creatorcontrib>Xie, Lingxi</creatorcontrib><creatorcontrib>Ni, Bingbing</creatorcontrib><creatorcontrib>Tian, Qi</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhou, Peng</au><au>Xie, Lingxi</au><au>Ni, Bingbing</au><au>Tian, Qi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CIPS-3D++: End-to-End Real-Time High-Resolution 3D-Aware GANs for GAN Inversion and Stylization</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><date>2023-10-01</date><risdate>2023</risdate><volume>45</volume><issue>10</issue><spage>11502</spage><epage>11520</epage><pages>11502-11520</pages><issn>0162-8828</issn><eissn>1939-3539</eissn><eissn>2160-9292</eissn><coden>ITPIDJ</coden><abstract><![CDATA[Style-based GANs achieve state-of-the-art results for generating high-quality images, but lack explicit and precise control over camera poses. Recently proposed NeRF-based GANs have made great progress towards 3D-aware image generation. However, the methods either rely on convolution operators which are not rotationally invariant, or utilize complex yet suboptimal training procedures to integrate both NeRF and CNN sub-structures, yielding un-robust, low-quality images with a large computational burden. This article presents an upgraded version called CIPS-3D++ , aiming at high-robust, high-resolution and high-efficiency 3D-aware GANs. On the one hand, our basic model CIPS-3D, encapsulated in a style-based architecture, features a shallow NeRF-based 3D shape encoder as well as a deep MLP-based 2D image decoder, achieving robust image generation/editing with rotation-invariance. On the other hand, our proposed CIPS-3D++, inheriting the rotational invariance of CIPS-3D, together with geometric regularization and upsampling operations, encourages high-resolution high-quality image generation/editing with great computational efficiency. Trained on raw single-view images, without any bells and whistles, CIPS-3D++ sets new records for 3D-aware image synthesis, with an impressive FID of 3.2 on FFHQ at the <inline-formula><tex-math notation="LaTeX">1024\times 1024</tex-math> <mml:math><mml:mrow><mml:mn>1024</mml:mn><mml:mo>×</mml:mo><mml:mn>1024</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq1-3285648.gif"/> </inline-formula> resolution. In the meantime, CIPS-3D++ runs efficiently and enjoys a low GPU memory footprint so that it can be trained end-to-end on high-resolution images directly, in contrast to previous alternate/progressive methods. Based on the infrastructure of CIPS-3D++, we propose a 3D-aware GAN inversion algorithm named FlipInversion , which can reconstruct the 3D object from a single-view image. We also provide a 3D-aware stylization method for real images based on CIPS-3D++ and FlipInversion. In addition, we analyze the problem of mirror symmetry suffered in training, and solve it by introducing an auxiliary discriminator for the NeRF network. Overall, CIPS-3D++ provides a strong base model that can serve as a testbed for transferring GAN-based image editing methods from 2D to 3D.]]></abstract><cop>United States</cop><pub>IEEE</pub><pmid>37310846</pmid><doi>10.1109/TPAMI.2023.3285648</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0003-4831-9451</orcidid><orcidid>https://orcid.org/0000-0001-7339-028X</orcidid><orcidid>https://orcid.org/0000-0002-0674-9296</orcidid><orcidid>https://orcid.org/0000-0002-7252-5047</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 0162-8828 |
ispartof | IEEE transactions on pattern analysis and machine intelligence, 2023-10, Vol.45 (10), p.11502-11520 |
issn | 0162-8828 1939-3539 2160-9292 |
language | eng |
recordid | cdi_proquest_journals_2861454233 |
source | IEEE Electronic Library (IEL) |
subjects | 3D-aware Algorithms Coders Editing GANs Generative adversarial networks Generators High resolution Image contrast Image processing Image quality Image resolution Invariance inversion Mirrors Operators (mathematics) Regularization Semantics Shape Solid modeling stylization Three dimensional models Three-dimensional displays Training |
title | CIPS-3D++: End-to-End Real-Time High-Resolution 3D-Aware GANs for GAN Inversion and Stylization |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T12%3A45%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CIPS-3D++:%20End-to-End%20Real-Time%20High-Resolution%203D-Aware%20GANs%20for%20GAN%20Inversion%20and%20Stylization&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Zhou,%20Peng&rft.date=2023-10-01&rft.volume=45&rft.issue=10&rft.spage=11502&rft.epage=11520&rft.pages=11502-11520&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2023.3285648&rft_dat=%3Cproquest_RIE%3E2825810974%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2861454233&rft_id=info:pmid/37310846&rft_ieee_id=10149489&rfr_iscdi=true |