Visual-textual sentiment classification with bi-directional multi-level attention networks

Social network has become an inseparable part of our daily lives and thus the automatic sentiment analysis on social media content is of great significance to identify people’s viewpoints, attitudes, and emotions on the social websites. Most existing works have concentrated on the sentiment analysis...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Knowledge-based systems 2019-08, Vol.178, p.61-73
Hauptverfasser: Xu, Jie, Huang, Feiran, Zhang, Xiaoming, Wang, Senzhang, Li, Chaozhuo, Li, Zhoujun, He, Yueying
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Social network has become an inseparable part of our daily lives and thus the automatic sentiment analysis on social media content is of great significance to identify people’s viewpoints, attitudes, and emotions on the social websites. Most existing works have concentrated on the sentiment analysis of single modality such as image or text, which cannot handle the social media content with multiple modalities including both image and text. Although some works tried to conduct multi-modal sentiment analysis, the complicated correlations between the two modalities have not been fully explored. In this paper, we propose a novel Bi-Directional Multi-Level Attention (BDMLA) model to exploit the complementary and comprehensive information between the image modality and text modality for joint visual-textual sentiment classification. Specifically, to highlight the emotional regions and words in the image–text pair, visual attention network and semantic attention network are proposed respectively. The visual attention network makes region features of the image interact with multiple semantic levels of text (word, phrase, and sentence) to obtain the attended visual features. The semantic attention network makes semantic features of the text interact with multiple visual levels of image (global and local) to obtain the attended semantic features. Then, the attended visual and semantic features from the two attention networks are unified into a holistic framework to conduct visual-textual sentiment classification. Proof-of-concept experiments conducted on three real-world datasets verify the effectiveness of our model. •Bi-directional attention to highlight the emotional regions and words.•Multiple levels to excavate the emotional correlations between image and text.•The experimental results demonstrate the superiority of the proposed model.
ISSN:0950-7051
1872-7409
DOI:10.1016/j.knosys.2019.04.018