Mi-CGA: Cross-modal Graph Attention Network for robust emotion recognition in the presence of incomplete modalities

Multimodal Emotion Recognition in Conversation (Multimodal ERC) is crucial for understanding human communication across various applications. However, the challenge of missing modalities impedes the development of robust models. Existing approaches often overlook scenarios where multiple modalities...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Neurocomputing (Amsterdam) 2025-03, Vol.623, p.129342, Article 129342
Hauptverfasser:	Nguyen, Cam-Van Thi, Kieu, Hai-Dang, Ha, Quang-Thuy, Phan, Xuan-Hieu, Le, Duc-Trong
Format:	Artikel
Sprache:	eng
Schlagworte:	Cross-modality attention Graph Attention Network Modality incompleteness Multimodal emotion recognition
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Multimodal Emotion Recognition in Conversation (Multimodal ERC) is crucial for understanding human communication across various applications. However, the challenge of missing modalities impedes the development of robust models. Existing approaches often overlook scenarios where multiple modalities are absent simultaneously and fail to explore deep semantic interactions between modalities. Additionally, learning high-dimensional interactive features from limited samples is challenging due to missing data. This paper proposes Mi-CGA, a framework tailored for incomplete multimodal learning in conversational contexts. Mi-CGA comprises two main components: Incomplete Multimodal Representation (IMR) and Cross-modal Graph Attention Network (CGA-Net). IMR simulates incomplete modalities, while CGA-Net extracts rich information from conversational graphs. CGA-Net consists of three key modules: Modality Feature Estimation reconstructs missing data, Multi-head Graph Attention Network enhances utterance-level representation, and Cross-modal Attention Network improves conversation-level representation. Experimental results on three benchmark datasets (IEMOCAP, CMU-MOSI, and CMU-MOSEI) consistently demonstrate that Mi-CGA outperforms several representative baseline models, marking a significant advancement for the Multimodal ERC task. Source code for Mi-CGA is available at https://github.com/dangkh/Mi-CGA. •CGA-Net ensures robust multimodal representation despite incomplete modalities.•The FE module in CGA-Net is essential for reconstructing missing data.•Combining GAT and Cross-modal Attention in CGA-Net proved effective.•Several SOTA models are compared and extensive experiments are conducted.
ISSN:	0925-2312
DOI:	10.1016/j.neucom.2025.129342