Image–text sentiment analysis via deep multimodal attentive fusion

Sentiment analysis of social media data is crucial to understand people’s position, attitude, and opinion toward a certain event, which has many applications such as election prediction and product evaluation. Though great effort has been devoted to the single modality (image or text), less effort i...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge-based systems 2019-03, Vol.167, p.26-37
Hauptverfasser:	Huang, Feiran, Zhang, Xiaoming, Zhao, Zhonghua, Xu, Jie, Li, Zhoujun
Format:	Artikel
Sprache:	eng
Schlagworte:	Attention model Contrarian investing Data mining Digital media Elections Fusion Image classification Mathematical models Multimodal learning Sentiment analysis Social networks Visual discrimination
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Sentiment analysis of social media data is crucial to understand people’s position, attitude, and opinion toward a certain event, which has many applications such as election prediction and product evaluation. Though great effort has been devoted to the single modality (image or text), less effort is paid to the joint analysis of multimodal data in social media. Most of the existing methods for multimodal sentiment analysis simply combine different data modalities, which results in dissatisfying performance on sentiment classification. In this paper, we propose a novel image–text sentiment analysis model, i.e., Deep Multimodal Attentive Fusion (DMAF), to exploit the discriminative features and the internal correlation between visual and semantic contents with a mixed fusion framework for sentiment analysis. Specifically, to automatically focus on discriminative regions and important words which are most related to the sentiment, two separate unimodal attention models are proposed to learn effective emotion classifiers for visual and textual modality respectively. Then, an intermediate fusion-based multimodal attention model is proposed to exploit the internal correlation between visual and textual features for joint sentiment classification. Finally, a late fusion scheme is applied to combine the three attention models for sentiment prediction. Extensive experiments are conducted to demonstrate the effectiveness of our approach on both weakly labeled and manually labeled datasets.
ISSN:	0950-7051 1872-7409
DOI:	10.1016/j.knosys.2019.01.019