HifiDiff: High-fidelity diffusion model for face hallucination from tiny non-frontal faces
Obtaining a high-quality frontal facial image from a low-resolution (LR) non-frontal facial image is crucial for many facial analysis tasks. Recently, diffusion models (DMs) have made impressive progress in near-frontal face super-resolution. However, when faced with non-frontal LR faces, the existi...
Gespeichert in:
Veröffentlicht in: | Neurocomputing (Amsterdam) 2025-02, Vol.616, p.128882, Article 128882 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Obtaining a high-quality frontal facial image from a low-resolution (LR) non-frontal facial image is crucial for many facial analysis tasks. Recently, diffusion models (DMs) have made impressive progress in near-frontal face super-resolution. However, when faced with non-frontal LR faces, the existing DMs exhibit poor identity preservation and facial detail fidelity. In this paper, we present a novel high-fidelity DM named HifiDiff for simultaneously super-resolving and frontalizing tiny non-frontal facial images. It consists of a two-stage pipeline: facial preview and facial refinement. In the first stage, we pretrain a coarse restoration module to obtain a coarse high-resolution (HR) frontal face, which serves as a superior constraint condition to enhance the ability to solve complex inverse transform issues. In the second stage, we leverage the strong generation capabilities of the latent DM to refine the facial details. Specifically, we design a two-pathway control structure that consists of a facial prior guidance (FPG) module and an identity consistency (IDC) module to control the denoising process. FPG encodes multilevel features derived from latent coarse HR frontal faces and employs hybrid cross-attention to capture their intrinsic correlations with the denoiser features, thereby improving the fidelity of the facial details. IDC utilizes contrastive learning to extract high-level semantic identity-representing features to constrain the denoiser, thereby maintaining the fidelity of facial identities. Extensive experiments demonstrate that our HifiDiff produces both high-fidelity and realistic HR frontal facial images, surpassing other state-of-the-art methods in qualitative and quantitative analyses, as well as in downstream facial recognition tasks. |
---|---|
ISSN: | 0925-2312 |
DOI: | 10.1016/j.neucom.2024.128882 |