CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information
Electroencephalogram (EEG) signals have attracted significant attention from researchers due to their non-invasive nature and high temporal sensitivity in decoding visual stimuli. However, most recent studies have focused solely on the relationship between EEG and image data pairs, neglecting the va...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Electroencephalogram (EEG) signals have attracted significant attention from
researchers due to their non-invasive nature and high temporal sensitivity in
decoding visual stimuli. However, most recent studies have focused solely on
the relationship between EEG and image data pairs, neglecting the valuable
``beyond-image-modality" information embedded in EEG signals. This results in
the loss of critical multimodal information in EEG. To address this limitation,
we propose CognitionCapturer, a unified framework that fully leverages
multimodal data to represent EEG signals. Specifically, CognitionCapturer
trains Modality Expert Encoders for each modality to extract cross-modal
information from the EEG modality. Then, it introduces a diffusion prior to map
the EEG embedding space to the CLIP embedding space, followed by using a
pretrained generative model, the proposed framework can reconstruct visual
stimuli with high semantic and structural fidelity. Notably, the framework does
not require any fine-tuning of the generative models and can be extended to
incorporate more modalities. Through extensive experiments, we demonstrate that
CognitionCapturer outperforms state-of-the-art methods both qualitatively and
quantitatively. Code: https://github.com/XiaoZhangYES/CognitionCapturer. |
---|---|
DOI: | 10.48550/arxiv.2412.10489 |