Hybrid Representation and Decision Fusion towards Visual-textual Sentiment

The rising use of online media has changed social customs of the public. Users have become gradually accustomed to sharing daily experiences and publishing personal opinions on social networks. Social data carrying with emotions and attitudes have provided significant decision support for numerous t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on intelligent systems and technology 2023-04, Vol.14 (3), p.1-17, Article 48
Hauptverfasser: Yin, Chunyong, Zhang, Sun, Zeng, Qingkui
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The rising use of online media has changed social customs of the public. Users have become gradually accustomed to sharing daily experiences and publishing personal opinions on social networks. Social data carrying with emotions and attitudes have provided significant decision support for numerous tasks in sentiment analysis. Conventional sentiment analysis methods only concern about textual modality and are vulnerable to the multimodal scenario, while common multimodal approaches only focus on the interactive relationship between modalities without considering unique intra-modal information. A hybrid fusion network is proposed in this work to capture both the inter-modal and intra-modal features. First, in the intermediate fusion stage, a multi-head visual attention is proposed to extract accurate semantic and sentimental information from textual embedding representations with the assistance of visual features. Then, multiple base classifiers are trained to learn independent and diverse discriminative information from different modal representations in the late fusion stage. The final decision is determined based on fusing the decision supports from base classifiers via a decision fusion method. To improve the generalization of our hybrid fusion network, a similarity loss is employed to inject decision diversity into the whole model. Empirical results on multimodal datasets have demonstrated the proposed model achieves a higher accuracy and better generalization compared with baselines for multimodal sentiment analysis.
ISSN:2157-6904
2157-6912
DOI:10.1145/3583076