Brain-driven facial image reconstruction via StyleGAN inversion with improved identity consistency
The reconstruction of visual stimuli from fMRI data represents a major technological and scientific challenge at the forefront of contemporary neuroscience research. Deep learning techniques have played a critical role in advancing decoding models for visual stimulus reconstruction from fMRI data. P...
Gespeichert in:
Veröffentlicht in: | Pattern recognition 2024-06, Vol.150, p.110331, Article 110331 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The reconstruction of visual stimuli from fMRI data represents a major technological and scientific challenge at the forefront of contemporary neuroscience research. Deep learning techniques have played a critical role in advancing decoding models for visual stimulus reconstruction from fMRI data. Particularly, the use of advanced GANs has resulted in significant improvements in the quality of image generation, providing a powerful tool for addressing the challenges of this complex task. However, none of these studies have taken into account the inherent characteristics of the stimulus contents themselves; This, in turn, leads to unsatisfactory outcomes, as demonstrated by the inconsistent identity between reconstructed faces and ground truth in the decoding of facial images. In order to tackle this challenge, we introduce a new framework aimed at enhancing the accuracy of reconstructing facial images from fMRI data. Our key innovation involves extracting and disentangling multi-level visual information from brain signals in the latent space and optimizing high-level features for facial identity control using identity loss. Specifically, our framework uses StyleGAN inversion to extract hierarchical latent codes from images, which are then bridged to fMRI data through transformation blocks. Additionally, we introduce a multi-stage refinement method to enhance the accuracy of reconstructed faces, which involves progressively updating fMRI latent codes with custom loss functions designed for both feature- and image-wise optimization. Our experimental results demonstrate that our proposed framework effectively achieves two critical objectives: (1) accurate facial image reconstruction from fMRI data and (2) preservation of identity characteristics with a high level of consistency.
•We propose a new cross-modal decoding framework for brain-driven image reconstruction.•We leverage StyleGAN inversion to enable realistic visual quality of reconstructions.•We disentangle face representation to make hierarchical multi-modal semantic alignment.•We design progressive refinement learning to iteratively refine recovery accuracy.•We introduce contrastive loss and multi-layer identity loss to improve ID consistency. |
---|---|
ISSN: | 0031-3203 1873-5142 |
DOI: | 10.1016/j.patcog.2024.110331 |