Cross-Modal Multitask Transformer for End-to-End Multimodal Aspect-Based Sentiment Analysis

As an emerging task in opinion mining, End-to-End Multimodal Aspect-Based Sentiment Analysis (MABSA) aims to extract all the aspect-sentiment pairs mentioned in a pair of sentence and image. Most existing methods of MABSA do not explicitly incorporate aspect and sentiment information in their textua...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Information processing & management 2022-09, Vol.59 (5), p.103038, Article 103038
Hauptverfasser:	Yang, Li, Na, Jin-Cheon, Yu, Jianfei
Format:	Artikel
Sprache:	eng
Schlagworte:	Aspect-Based Sentiment Analysis Fine-grained opinion mining Multimodal Sentiment Analysis
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	As an emerging task in opinion mining, End-to-End Multimodal Aspect-Based Sentiment Analysis (MABSA) aims to extract all the aspect-sentiment pairs mentioned in a pair of sentence and image. Most existing methods of MABSA do not explicitly incorporate aspect and sentiment information in their textual and visual representations and fail to consider the different contributions of visual representations to each word or aspect in the text. To tackle these limitations, we propose a multi-task learning framework named Cross-Modal Multitask Transformer (CMMT), which incorporates two auxiliary tasks to learn the aspect/sentiment-aware intra-modal representations and introduces a Text-Guided Cross-Modal Interaction Module to dynamically control the contributions of the visual information to the representation of each word in the inter-modal interaction. Experimental results demonstrate that CMMT consistently outperforms the state-of-the-art approach JML by 3.1, 3.3, and 4.1 absolute percentage points on three Twitter datasets for the End-to-End MABSA task, respectively. Moreover, further analysis shows that CMMT is superior to comparison systems in both aspect extraction (AE) and sentiment classification (SC), which would move the development of multimodal AE and SC algorithms forward with improved performance. •We study a new task named End-to-End Multimodal Aspect Based Sentiment Analysis (MABSA).•We propose a Cross-Modal Multitask Transformer (CMMT) framework for End-to-End MABSA.•Experimental results show that CMMT outperforms a number of existing approaches.
ISSN:	0306-4573 1873-5371
DOI:	10.1016/j.ipm.2022.103038