A text guided multi-task learning network for multimodal sentiment analysis

Multimodal Sentiment Analysis (MSA) is an active area of research that leverages multimodal signals for affective understanding of user-generated videos. Existing research tends to develop sophisticated fusion techniques to fuse unimodal representations into multimodal representation and treat MSA a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neurocomputing (Amsterdam) 2023-12, Vol.560, p.126836, Article 126836
Hauptverfasser: Luo, Yuanyi, Wu, Rui, Liu, Jiafeng, Tang, Xianglong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Multimodal Sentiment Analysis (MSA) is an active area of research that leverages multimodal signals for affective understanding of user-generated videos. Existing research tends to develop sophisticated fusion techniques to fuse unimodal representations into multimodal representation and treat MSA as a single prediction task. However, we find that the text modality with the pre-trained model (BERT) learn more semantic information and dominates the training in multimodal models, inhibiting the learning of other modalities. Besides, the classification ability of each modality is also suppressed by single-task learning. In this paper, We propose a text guided multi-task learning network to enhance the semantic information of non-text modalities and improve the learning ability of unimodal networks. We conducted experiments on multimodal sentiment analysis datasets, CMU-MOSI, CMU-MOSEI, and CH-SIMS. The results show that our method outperforms the current SOTA method.
ISSN:0925-2312
1872-8286
DOI:10.1016/j.neucom.2023.126836