Modality- and Subject-Aware Emotion Recognition Using Knowledge Distillation

Multimodal emotion recognition has the potential to impact various fields, including human-computer interaction, virtual reality, and emotional intelligence systems. This study introduces a comprehensive framework that enhances the accuracy and computational efficiency of emotion recognition by leve...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2024, Vol.12, p.122485-122502
Hauptverfasser:	Sarikaya, Mehmet Ali, Ince, Gokhan
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Adaptation models Brain modeling Brain-computer interface Brain-computer interfaces cross-modal distillation EEG-based models Electroencephalography Emotion recognition knowledge distillation multimodal models Multimodal sensors Solid modeling subject-independent models subject-specific models Transfer learning virtual reality
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Multimodal emotion recognition has the potential to impact various fields, including human-computer interaction, virtual reality, and emotional intelligence systems. This study introduces a comprehensive framework that enhances the accuracy and computational efficiency of emotion recognition by leveraging knowledge distillation and transfer learning, incorporating both unimodal and multimodal models. The framework also combines subject-specific and subject-independent models, achieving a balance between localization and generalization. Subject-independent models include EEG-based, non-EEG-based (i.e., electromyography, electrooculography, electrodermal activity, galvanic skin response, skin temperature, respiration, blood volume pulse, heart rate, and eye movements), and multimodal models trained on all training subjects, capturing a broader context. Subject-specific models, including EEG-based, non-EEG-based, and multimodal models, are trained on individual subjects to provide localized knowledge. The proposed framework then distills knowledge from these teacher models into a student model, utilizing six different distillation losses to combine both subject-independent and subject-specific insights. This approach makes the model subject-aware by using local patterns and modality-aware by incorporating unimodal data, enhancing the robustness and generalizability of emotion recognition systems to varied real-world scenarios. The framework was tested on two well-known datasets, SEED-V and DEAP, as well as an immersive three-Dimensional (3D) Virtual Reality (VR) dataset, GraffitiVR, which captures emotional and behavioral responses from individuals experiencing urban graffiti in a VR environment. This broader application provides insights into the effectiveness of emotion recognition models in both 2D and 3D settings, facilitating a wider range of assessment. Empirical results demonstrate that the proposed knowledge distillation-based model significantly elevates performance across all datasets when compared to traditional models. Specifically, the model demonstrated improvements ranging from 6.56% to 24.59% over unimodal models and from 1.56% to 4.11% over multimodal approaches across the SEED-V, DEAP, and GraffitiVR datasets. These results underscore the robustness and effectiveness of the proposed approach, suggesting that it significantly enhances emotion recognition processes across various environmental settings.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2024.3452781