CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information

Electroencephalogram (EEG) signals have attracted significant attention from researchers due to their non-invasive nature and high temporal sensitivity in decoding visual stimuli. However, most recent studies have focused solely on the relationship between EEG and image data pairs, neglecting the va...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Zhang, Kaifan, He, Lihuo, Jiang, Xin, Lu, Wen, Wang, Di, Gao, Xinbo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Zhang, Kaifan
He, Lihuo
Jiang, Xin
Lu, Wen
Wang, Di
Gao, Xinbo
description Electroencephalogram (EEG) signals have attracted significant attention from researchers due to their non-invasive nature and high temporal sensitivity in decoding visual stimuli. However, most recent studies have focused solely on the relationship between EEG and image data pairs, neglecting the valuable ``beyond-image-modality" information embedded in EEG signals. This results in the loss of critical multimodal information in EEG. To address this limitation, we propose CognitionCapturer, a unified framework that fully leverages multimodal data to represent EEG signals. Specifically, CognitionCapturer trains Modality Expert Encoders for each modality to extract cross-modal information from the EEG modality. Then, it introduces a diffusion prior to map the EEG embedding space to the CLIP embedding space, followed by using a pretrained generative model, the proposed framework can reconstruct visual stimuli with high semantic and structural fidelity. Notably, the framework does not require any fine-tuning of the generative models and can be extended to incorporate more modalities. Through extensive experiments, we demonstrate that CognitionCapturer outperforms state-of-the-art methods both qualitatively and quantitatively. Code: https://github.com/XiaoZhangYES/CognitionCapturer.
doi_str_mv 10.48550/arxiv.2412.10489
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2412_10489</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2412_10489</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2412_104893</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE00jM0MLGw5GSIcc5Pz8ssyczPc04sKCktSi2yUnBJTc5PycxLVwjLLC5NzFEILsnMLc3JVHArys9V8CjNTcxTcHV1VwjOTM8DyoZnlmQo-JbmABXlpwD5nnlp-UW5iSAjeRhY0xJzilN5oTQ3g7yba4izhy7YHfEFRZm5iUWV8SD3xIPdY0xYBQAi20DF</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information</title><source>arXiv.org</source><creator>Zhang, Kaifan ; He, Lihuo ; Jiang, Xin ; Lu, Wen ; Wang, Di ; Gao, Xinbo</creator><creatorcontrib>Zhang, Kaifan ; He, Lihuo ; Jiang, Xin ; Lu, Wen ; Wang, Di ; Gao, Xinbo</creatorcontrib><description>Electroencephalogram (EEG) signals have attracted significant attention from researchers due to their non-invasive nature and high temporal sensitivity in decoding visual stimuli. However, most recent studies have focused solely on the relationship between EEG and image data pairs, neglecting the valuable ``beyond-image-modality" information embedded in EEG signals. This results in the loss of critical multimodal information in EEG. To address this limitation, we propose CognitionCapturer, a unified framework that fully leverages multimodal data to represent EEG signals. Specifically, CognitionCapturer trains Modality Expert Encoders for each modality to extract cross-modal information from the EEG modality. Then, it introduces a diffusion prior to map the EEG embedding space to the CLIP embedding space, followed by using a pretrained generative model, the proposed framework can reconstruct visual stimuli with high semantic and structural fidelity. Notably, the framework does not require any fine-tuning of the generative models and can be extended to incorporate more modalities. Through extensive experiments, we demonstrate that CognitionCapturer outperforms state-of-the-art methods both qualitatively and quantitatively. Code: https://github.com/XiaoZhangYES/CognitionCapturer.</description><identifier>DOI: 10.48550/arxiv.2412.10489</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-12</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2412.10489$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2412.10489$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhang, Kaifan</creatorcontrib><creatorcontrib>He, Lihuo</creatorcontrib><creatorcontrib>Jiang, Xin</creatorcontrib><creatorcontrib>Lu, Wen</creatorcontrib><creatorcontrib>Wang, Di</creatorcontrib><creatorcontrib>Gao, Xinbo</creatorcontrib><title>CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information</title><description>Electroencephalogram (EEG) signals have attracted significant attention from researchers due to their non-invasive nature and high temporal sensitivity in decoding visual stimuli. However, most recent studies have focused solely on the relationship between EEG and image data pairs, neglecting the valuable ``beyond-image-modality" information embedded in EEG signals. This results in the loss of critical multimodal information in EEG. To address this limitation, we propose CognitionCapturer, a unified framework that fully leverages multimodal data to represent EEG signals. Specifically, CognitionCapturer trains Modality Expert Encoders for each modality to extract cross-modal information from the EEG modality. Then, it introduces a diffusion prior to map the EEG embedding space to the CLIP embedding space, followed by using a pretrained generative model, the proposed framework can reconstruct visual stimuli with high semantic and structural fidelity. Notably, the framework does not require any fine-tuning of the generative models and can be extended to incorporate more modalities. Through extensive experiments, we demonstrate that CognitionCapturer outperforms state-of-the-art methods both qualitatively and quantitatively. Code: https://github.com/XiaoZhangYES/CognitionCapturer.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE00jM0MLGw5GSIcc5Pz8ssyczPc04sKCktSi2yUnBJTc5PycxLVwjLLC5NzFEILsnMLc3JVHArys9V8CjNTcxTcHV1VwjOTM8DyoZnlmQo-JbmABXlpwD5nnlp-UW5iSAjeRhY0xJzilN5oTQ3g7yba4izhy7YHfEFRZm5iUWV8SD3xIPdY0xYBQAi20DF</recordid><startdate>20241213</startdate><enddate>20241213</enddate><creator>Zhang, Kaifan</creator><creator>He, Lihuo</creator><creator>Jiang, Xin</creator><creator>Lu, Wen</creator><creator>Wang, Di</creator><creator>Gao, Xinbo</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241213</creationdate><title>CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information</title><author>Zhang, Kaifan ; He, Lihuo ; Jiang, Xin ; Lu, Wen ; Wang, Di ; Gao, Xinbo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2412_104893</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Kaifan</creatorcontrib><creatorcontrib>He, Lihuo</creatorcontrib><creatorcontrib>Jiang, Xin</creatorcontrib><creatorcontrib>Lu, Wen</creatorcontrib><creatorcontrib>Wang, Di</creatorcontrib><creatorcontrib>Gao, Xinbo</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Kaifan</au><au>He, Lihuo</au><au>Jiang, Xin</au><au>Lu, Wen</au><au>Wang, Di</au><au>Gao, Xinbo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information</atitle><date>2024-12-13</date><risdate>2024</risdate><abstract>Electroencephalogram (EEG) signals have attracted significant attention from researchers due to their non-invasive nature and high temporal sensitivity in decoding visual stimuli. However, most recent studies have focused solely on the relationship between EEG and image data pairs, neglecting the valuable ``beyond-image-modality" information embedded in EEG signals. This results in the loss of critical multimodal information in EEG. To address this limitation, we propose CognitionCapturer, a unified framework that fully leverages multimodal data to represent EEG signals. Specifically, CognitionCapturer trains Modality Expert Encoders for each modality to extract cross-modal information from the EEG modality. Then, it introduces a diffusion prior to map the EEG embedding space to the CLIP embedding space, followed by using a pretrained generative model, the proposed framework can reconstruct visual stimuli with high semantic and structural fidelity. Notably, the framework does not require any fine-tuning of the generative models and can be extended to incorporate more modalities. Through extensive experiments, we demonstrate that CognitionCapturer outperforms state-of-the-art methods both qualitatively and quantitatively. Code: https://github.com/XiaoZhangYES/CognitionCapturer.</abstract><doi>10.48550/arxiv.2412.10489</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2412.10489
ispartof
issn
language eng
recordid cdi_arxiv_primary_2412_10489
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Computer Vision and Pattern Recognition
title CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T03%3A47%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CognitionCapturer:%20Decoding%20Visual%20Stimuli%20From%20Human%20EEG%20Signal%20With%20Multimodal%20Information&rft.au=Zhang,%20Kaifan&rft.date=2024-12-13&rft_id=info:doi/10.48550/arxiv.2412.10489&rft_dat=%3Carxiv_GOX%3E2412_10489%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true