UniFaceGAN: A Unified Framework for Temporally Consistent Facial Video Editing

Recent research has witnessed advances in facial image editing tasks including face swapping and face reenactment. However, these methods are confined to dealing with one specific task at a time. In addition, for video facial editing, previous methods either simply apply transformations frame by fra...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on image processing 2021, Vol.30, p.6107-6116
Hauptverfasser:	Cao, Meng, Huang, Haozhi, Wang, Hao, Wang, Xuan, Shen, Li, Wang, Sheng, Bao, Linchao, Li, Zhifeng, Luo, Jiebo
Format:	Artikel
Sprache:	eng
Schlagworte:	3D temporal loss dynamic training sample selection Editing Faces Facial video editing Image reconstruction Interpolation Iterative methods Optical losses region-aware conditional normalization Solid modeling Task analysis Three dimensional models Three-dimensional displays Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Recent research has witnessed advances in facial image editing tasks including face swapping and face reenactment. However, these methods are confined to dealing with one specific task at a time. In addition, for video facial editing, previous methods either simply apply transformations frame by frame or utilize multiple frames in a concatenated or iterative fashion, which leads to noticeable visual flickers. In this paper, we propose a unified temporally consistent facial video editing framework termed UniFaceGAN. Based on a 3D reconstruction model and a simple yet efficient dynamic training sample selection mechanism, our framework is designed to handle face swapping and face reenactment simultaneously. To enforce the temporal consistency, a novel 3D temporal loss constraint is introduced based on the barycentric coordinate interpolation. Besides, we propose a region-aware conditional normalization layer to replace the traditional AdaIN or SPADE to synthesize more context-harmonious results. Compared with the state-of-the-art facial image editing methods, our framework generates video portraits that are more photo-realistic and temporally smooth.
ISSN:	1057-7149 1941-0042
DOI:	10.1109/TIP.2021.3089909