CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information

Electroencephalogram (EEG) signals have attracted significant attention from researchers due to their non-invasive nature and high temporal sensitivity in decoding visual stimuli. However, most recent studies have focused solely on the relationship between EEG and image data pairs, neglecting the va...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zhang, Kaifan, He, Lihuo, Jiang, Xin, Lu, Wen, Wang, Di, Gao, Xinbo
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Zhang, Kaifan He, Lihuo Jiang, Xin Lu, Wen Wang, Di Gao, Xinbo
description	Electroencephalogram (EEG) signals have attracted significant attention from researchers due to their non-invasive nature and high temporal sensitivity in decoding visual stimuli. However, most recent studies have focused solely on the relationship between EEG and image data pairs, neglecting the valuable ``beyond-image-modality" information embedded in EEG signals. This results in the loss of critical multimodal information in EEG. To address this limitation, we propose CognitionCapturer, a unified framework that fully leverages multimodal data to represent EEG signals. Specifically, CognitionCapturer trains Modality Expert Encoders for each modality to extract cross-modal information from the EEG modality. Then, it introduces a diffusion prior to map the EEG embedding space to the CLIP embedding space, followed by using a pretrained generative model, the proposed framework can reconstruct visual stimuli with high semantic and structural fidelity. Notably, the framework does not require any fine-tuning of the generative models and can be extended to incorporate more modalities. Through extensive experiments, we demonstrate that CognitionCapturer outperforms state-of-the-art methods both qualitatively and quantitatively. Code: https://github.com/XiaoZhangYES/CognitionCapturer.
doi_str_mv	10.48550/arxiv.2412.10489
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2412_10489</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2412_10489</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2412_104893</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE00jM0MLGw5GSIcc5Pz8ssyczPc04sKCktSi2yUnBJTc5PycxLVwjLLC5NzFEILsnMLc3JVHArys9V8CjNTcxTcHV1VwjOTM8DyoZnlmQo-JbmABXlpwD5nnlp-UW5iSAjeRhY0xJzilN5oTQ3g7yba4izhy7YHfEFRZm5iUWV8SD3xIPdY0xYBQAi20DF</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information</title><source>arXiv.org</source><creator>Zhang, Kaifan ; He, Lihuo ; Jiang, Xin ; Lu, Wen ; Wang, Di ; Gao, Xinbo</creator><creatorcontrib>Zhang, Kaifan ; He, Lihuo ; Jiang, Xin ; Lu, Wen ; Wang, Di ; Gao, Xinbo</creatorcontrib><description>Electroencephalogram (EEG) signals have attracted significant attention from researchers due to their non-invasive nature and high temporal sensitivity in decoding visual stimuli. However, most recent studies have focused solely on the relationship between EEG and image data pairs, neglecting the valuable ``beyond-image-modality" information embedded in EEG signals. This results in the loss of critical multimodal information in EEG. To address this limitation, we propose CognitionCapturer, a unified framework that fully leverages multimodal data to represent EEG signals. Specifically, CognitionCapturer trains Modality Expert Encoders for each modality to extract cross-modal information from the EEG modality. Then, it introduces a diffusion prior to map the EEG embedding space to the CLIP embedding space, followed by using a pretrained generative model, the proposed framework can reconstruct visual stimuli with high semantic and structural fidelity. Notably, the framework does not require any fine-tuning of the generative models and can be extended to incorporate more modalities. Through extensive experiments, we demonstrate that CognitionCapturer outperforms state-of-the-art methods both qualitatively and quantitatively. Code: https://github.com/XiaoZhangYES/CognitionCapturer.</description><identifier>DOI: 10.48550/arxiv.2412.10489</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-12</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2412.10489$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2412.10489$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhang, Kaifan</creatorcontrib><creatorcontrib>He, Lihuo</creatorcontrib><creatorcontrib>Jiang, Xin</creatorcontrib><creatorcontrib>Lu, Wen</creatorcontrib><creatorcontrib>Wang, Di</creatorcontrib><creatorcontrib>Gao, Xinbo</creatorcontrib><title>CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information</title><description>Electroencephalogram (EEG) signals have attracted significant attention from researchers due to their non-invasive nature and high temporal sensitivity in decoding visual stimuli. However, most recent studies have focused solely on the relationship between EEG and image data pairs, neglecting the valuable ``beyond-image-modality" information embedded in EEG signals. This results in the loss of critical multimodal information in EEG. To address this limitation, we propose CognitionCapturer, a unified framework that fully leverages multimodal data to represent EEG signals. Specifically, CognitionCapturer trains Modality Expert Encoders for each modality to extract cross-modal information from the EEG modality. Then, it introduces a diffusion prior to map the EEG embedding space to the CLIP embedding space, followed by using a pretrained generative model, the proposed framework can reconstruct visual stimuli with high semantic and structural fidelity. Notably, the framework does not require any fine-tuning of the generative models and can be extended to incorporate more modalities. Through extensive experiments, we demonstrate that CognitionCapturer outperforms state-of-the-art methods both qualitatively and quantitatively. Code: https://github.com/XiaoZhangYES/CognitionCapturer.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE00jM0MLGw5GSIcc5Pz8ssyczPc04sKCktSi2yUnBJTc5PycxLVwjLLC5NzFEILsnMLc3JVHArys9V8CjNTcxTcHV1VwjOTM8DyoZnlmQo-JbmABXlpwD5nnlp-UW5iSAjeRhY0xJzilN5oTQ3g7yba4izhy7YHfEFRZm5iUWV8SD3xIPdY0xYBQAi20DF</recordid><startdate>20241213</startdate><enddate>20241213</enddate><creator>Zhang, Kaifan</creator><creator>He, Lihuo</creator><creator>Jiang, Xin</creator><creator>Lu, Wen</creator><creator>Wang, Di</creator><creator>Gao, Xinbo</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241213</creationdate><title>CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information</title><author>Zhang, Kaifan ; He, Lihuo ; Jiang, Xin ; Lu, Wen ; Wang, Di ; Gao, Xinbo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2412_104893</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Kaifan</creatorcontrib><creatorcontrib>He, Lihuo</creatorcontrib><creatorcontrib>Jiang, Xin</creatorcontrib><creatorcontrib>Lu, Wen</creatorcontrib><creatorcontrib>Wang, Di</creatorcontrib><creatorcontrib>Gao, Xinbo</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Kaifan</au><au>He, Lihuo</au><au>Jiang, Xin</au><au>Lu, Wen</au><au>Wang, Di</au><au>Gao, Xinbo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information</atitle><date>2024-12-13</date><risdate>2024</risdate><abstract>Electroencephalogram (EEG) signals have attracted significant attention from researchers due to their non-invasive nature and high temporal sensitivity in decoding visual stimuli. However, most recent studies have focused solely on the relationship between EEG and image data pairs, neglecting the valuable ``beyond-image-modality" information embedded in EEG signals. This results in the loss of critical multimodal information in EEG. To address this limitation, we propose CognitionCapturer, a unified framework that fully leverages multimodal data to represent EEG signals. Specifically, CognitionCapturer trains Modality Expert Encoders for each modality to extract cross-modal information from the EEG modality. Then, it introduces a diffusion prior to map the EEG embedding space to the CLIP embedding space, followed by using a pretrained generative model, the proposed framework can reconstruct visual stimuli with high semantic and structural fidelity. Notably, the framework does not require any fine-tuning of the generative models and can be extended to incorporate more modalities. Through extensive experiments, we demonstrate that CognitionCapturer outperforms state-of-the-art methods both qualitatively and quantitatively. Code: https://github.com/XiaoZhangYES/CognitionCapturer.</abstract><doi>10.48550/arxiv.2412.10489</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2412.10489
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2412_10489
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition
title	CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T03%3A47%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CognitionCapturer:%20Decoding%20Visual%20Stimuli%20From%20Human%20EEG%20Signal%20With%20Multimodal%20Information&rft.au=Zhang,%20Kaifan&rft.date=2024-12-13&rft_id=info:doi/10.48550/arxiv.2412.10489&rft_dat=%3Carxiv_GOX%3E2412_10489%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true