CIPS-3D++: End-to-End Real-Time High-Resolution 3D-Aware GANs for GAN Inversion and Stylization

Style-based GANs achieve state-of-the-art results for generating high-quality images, but lack explicit and precise control over camera poses. Recently proposed NeRF-based GANs have made great progress towards 3D-aware image generation. However, the methods either rely on convolution operators which...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence 2023-10, Vol.45 (10), p.11502-11520
Hauptverfasser: Zhou, Peng, Xie, Lingxi, Ni, Bingbing, Tian, Qi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Style-based GANs achieve state-of-the-art results for generating high-quality images, but lack explicit and precise control over camera poses. Recently proposed NeRF-based GANs have made great progress towards 3D-aware image generation. However, the methods either rely on convolution operators which are not rotationally invariant, or utilize complex yet suboptimal training procedures to integrate both NeRF and CNN sub-structures, yielding un-robust, low-quality images with a large computational burden. This article presents an upgraded version called CIPS-3D++ , aiming at high-robust, high-resolution and high-efficiency 3D-aware GANs. On the one hand, our basic model CIPS-3D, encapsulated in a style-based architecture, features a shallow NeRF-based 3D shape encoder as well as a deep MLP-based 2D image decoder, achieving robust image generation/editing with rotation-invariance. On the other hand, our proposed CIPS-3D++, inheriting the rotational invariance of CIPS-3D, together with geometric regularization and upsampling operations, encourages high-resolution high-quality image generation/editing with great computational efficiency. Trained on raw single-view images, without any bells and whistles, CIPS-3D++ sets new records for 3D-aware image synthesis, with an impressive FID of 3.2 on FFHQ at the 1024\times 1024 1024×1024 resolution. In the meantime, CIPS-3D++ runs efficiently and enjoys a low GPU memory footprint so that it can be trained end-to-end on high-resolution images directly, in contrast to previous alternate/progressive methods. Based on the infrastructure of CIPS-3D++, we propose a 3D-aware GAN inversion algorithm named FlipInversion , which can reconstruct the 3D object from a single-view image. We also provide a 3D-aware stylization method for real images based on CIPS-3D++ and FlipInversion. In addition, we analyze the problem of mirror symmetry suffered in training, and solve it by introducing an auxiliary discriminator for the NeRF network. Overall, CIPS-3D++ provides a strong base model that can serve as a testbed for transferring GAN-based image editing methods from 2D to 3D.
ISSN:0162-8828
1939-3539
2160-9292
DOI:10.1109/TPAMI.2023.3285648