Multi-View Knowledge Ensemble With Frequency Consistency for Cross-Domain Face Translation

Cross-domain face translation aims to transfer face images from one domain to another. It can be widely used in practical applications, such as photos/sketches in law enforcement, photos/drawings in digital entertainment, and near-infrared (NIR)/visible (VIS) images in security access control. Restr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transaction on neural networks and learning systems 2024-07, Vol.35 (7), p.9728-9742
Hauptverfasser: Cao, Bing, Wang, Qinghe, Zhu, Pengfei, Hu, Qinghua, Ren, Dongwei, Zuo, Wangmeng, Gao, Xinbo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Cross-domain face translation aims to transfer face images from one domain to another. It can be widely used in practical applications, such as photos/sketches in law enforcement, photos/drawings in digital entertainment, and near-infrared (NIR)/visible (VIS) images in security access control. Restricted by limited cross-domain face image pairs, the existing methods usually yield structural deformation or identity ambiguity, which leads to poor perceptual appearance. To address this challenge, we propose a multi-view knowledge (structural knowledge and identity knowledge) ensemble framework with frequency consistency (MvKE-FC) for cross-domain face translation. Due to the structural consistency of facial components, the multi-view knowledge learned from large-scale data can be appropriately transferred to limited cross-domain image pairs and significantly improve the generative performance. To better fuse multi-view knowledge, we further design an attention-based knowledge aggregation module that integrates useful information, and we also develop a frequency-consistent (FC) loss that constrains the generated images in the frequency domain. The designed FC loss consists of a multidirection Prewitt (mPrewitt) loss for high-frequency consistency and a Gaussian blur loss for low-frequency consistency. Furthermore, our FC loss can be flexibly applied to other generative models to enhance their overall performance. Extensive experiments on multiple cross-domain face datasets demonstrate the superiority of our method over state-of-the-art methods both qualitatively and quantitatively.
ISSN:2162-237X
2162-2388
2162-2388
DOI:10.1109/TNNLS.2023.3236486