fNIRSNET: A multi-view spatio-temporal convolutional neural network fusion for functional near-infrared spectroscopy-based auditory event classification

Multi-view learning is a rapidly evolving research area focused on developing diverse learning representations. In neural data analysis, this approach holds immense potential by capturing spatial, temporal, and frequency features. Despite its promise, multi-view application to functional near-infrar...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Engineering applications of artificial intelligence 2024-11, Vol.137, p.109256, Article 109256
Hauptverfasser: Pandey, P., McLinden, J., Rahimi, N., Kumar, C., Shao, M., Spencer, K.M., Ostadabbas, S., Shahriari, Y.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Multi-view learning is a rapidly evolving research area focused on developing diverse learning representations. In neural data analysis, this approach holds immense potential by capturing spatial, temporal, and frequency features. Despite its promise, multi-view application to functional near-infrared spectroscopy (fNIRS) has remained largely unexplored. This study addresses this gap by introducing fNIRSNET, a novel framework that generates and fuses multi-view spatio-temporal representations using convolutional neural networks. It investigates the combined informational strength of oxygenated (HbO2) and deoxygenated (HbR) hemoglobin signals, further extending these capabilities by integrating with electroencephalography (EEG) networks to achieve robust multimodal classification. Experiments involved classifying neural responses to auditory stimuli with nine healthy participants. fNIRS signals were decomposed into HbO2/HbR concentration changes, resulting in Parallel and Merged input types. We evaluated four input types across three data compositions: balanced, subject, and complete datasets. Our fNIRSNET's performance was compared with eight baseline classification models and merged it with four common EEG networks to assess the efficacy of combined features for multimodal classification. Compared to baselines, fNIRSNET using the Merged input type achieved the highest accuracy of 83.22%, 81.18%, and 91.58% for balanced, subject, and complete datasets, respectively. In the complete set, the approach effectively mitigated class imbalance issues, achieving sensitivity of 83.58% and specificity of 95.42%. Multimodal fusion of EEG networks and fNIRSNET outperformed single-modality performance with the highest accuracy of 87.15% on balanced data. Overall, this study introduces an innovative fusion approach for decoding fNIRS data and illustrates its integration with established EEG networks to enhance performance. •fNIRSNET, a multi-view spatio-temporal convolutional neural network fusion, was proposed for fNIRS-based auditory classification.•Dual spatio-temporal representations of the fNIRS data were extracted and fused to enhance classification efficacy.•The proposed fNIRSNET outperformed other well-established methods across three different dataset configurations.•Fusing fNIRSNET with the EEG network achieved higher accuracy compared to using the unimodal EEG or fNIRS networks alone.
ISSN:0952-1976
DOI:10.1016/j.engappai.2024.109256