StyleBERT: Text-audio sentiment analysis with Bi-directional Style Enhancement

Recent multimodal sentiment analysis works focus on establishing sophisticated fusion strategies for better performance. However, a major limitation of these works is that they ignore effective modality representation learning before fusion. In this work, we propose a novel text-audio sentiment anal...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Information systems (Oxford) 2023-03, Vol.114, p.102147, Article 102147
Hauptverfasser:	Lin, Fei, Liu, Shengqiang, Zhang, Cong, Fan, Jin, Wu, Zizhao
Format:	Artikel
Sprache:	eng
Schlagworte:	Modality style Multimodal sentiment analysis Style enhancement
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Recent multimodal sentiment analysis works focus on establishing sophisticated fusion strategies for better performance. However, a major limitation of these works is that they ignore effective modality representation learning before fusion. In this work, we propose a novel text-audio sentiment analysis framework, named StyleBERT, to enhance the emotional information of unimodal representations by learning distinct modality styles, such that the model already obtains an effective unimodal representation before fusion, which mitigates the reliance on fusion. In particular, we propose a Bi-directional Style Enhancement module, which learns one contextualized style representation and two differentiated style representations for each modality, where the relevant semantic information across modalities and the discriminative characteristics of each modality will be captured. Furthermore, to learn fine-grained acoustic representation, we only use the directly available Log-Mel spectrograms as audio modality inputs and encode it with a multi-head self-attention mechanism. Comprehensive experimental results on three widely-used benchmark datasets demonstrate that the proposed StyleBERT is an effective multimodal framework and significantly outperforms the state-of-the-art multimodal baselines. Our code is available at https://github.com/lsq960124/StyleBERT. •Before multimodal fusion, we must perform modality representation learning.•Learning distinct modality style representations to enhance the representation ability of modality itself.•Comparable sentiment analysis results can be achieved using only directly available Log-Mel spectrograms.
ISSN:	0306-4379 1873-6076
DOI:	10.1016/j.is.2022.102147