M3GAT: A Multi-modal, Multi-task Interactive Graph Attention Network for Conversational Sentiment Analysis and Emotion Recognition

Sentiment and emotion, which correspond to long-term and short-lived human feelings, are closely linked to each other, leading to the fact that sentiment analysis and emotion recognition are also two interdependent tasks in natural language processing (NLP). One task often leverages the shared knowl...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM transactions on information systems 2023-08, Vol.42 (1), p.1-32, Article 13
Hauptverfasser:	Zhang, Yazhou, Jia, Ao, Wang, Bo, Zhang, Peng, Zhao, Dongming, Li, Pu, Hou, Yuexian, Jin, Xiaojia, Song, Dawei, Qin, Jing
Format:	Artikel
Sprache:	eng
Schlagworte:	Computing methodologies Knowledge representation and reasoning Natural language processing Network design principles Networks
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	32
container_issue	1
container_start_page	1
container_title	ACM transactions on information systems
container_volume	42
creator	Zhang, Yazhou Jia, Ao Wang, Bo Zhang, Peng Zhao, Dongming Li, Pu Hou, Yuexian Jin, Xiaojia Song, Dawei Qin, Jing
description	Sentiment and emotion, which correspond to long-term and short-lived human feelings, are closely linked to each other, leading to the fact that sentiment analysis and emotion recognition are also two interdependent tasks in natural language processing (NLP). One task often leverages the shared knowledge from another task and performs better when solved in a joint learning paradigm. Conversational context dependency, multi-modal interaction, and multi-task correlation are three key factors that contribute to this joint paradigm. However, none of the recent approaches have considered them in a unified framework. To fill this gap, we propose a multi-modal, multi-task interactive graph attention network, termed M3GAT, to simultaneously solve the three problems. At the heart of the model is a proposed interactive conversation graph layer containing three core sub-modules, which are: (1) local-global context connection for modeling both local and global conversational context, (2) cross-modal connection for learning multi-modal complementary and (3) cross-task connection for capturing the correlation across two tasks. Comprehensive experiments on three benchmarking datasets, MELD, MEISD, and MSED, show the effectiveness of M3GAT over state-of-the-art baselines with the margin of 1.88%, 5.37%, and 0.19% for sentiment analysis, and 1.99%, 3.65%, and 0.13% for emotion recognition, respectively. In addition, we also show the superiority of multi-task learning over the single-task framework.
doi_str_mv	10.1145/3593583
format	Article
fullrecord	<record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3593583</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3593583</sourcerecordid><originalsourceid>FETCH-LOGICAL-a240t-bf701a7c09adeb51d9025c07b35766d586a4808c34cb6272fb55307ccd098fee3</originalsourceid><addsrcrecordid>eNo9kDFPwzAQhS0EEqUgdiZvLATOcZw4bFFVClILEpQ5ujgOhCZxZZuirvxyElpY7t7T--6GR8g5g2vGInHDRcqF5AdkxISQQShjedhriOJAMimPyYlzHwC9j2FEvhd8li1vaUYXn42vg9aU2FztjUe3og-d1xaVrzeaziyu32nmve58bTr6qP2XsStaGUsnptto63AIsKEvA9L2g2a93braUexKOm3N7-WzVuatqwd9So4qbJw-2-8xeb2bLif3wfxp9jDJ5gGGEfigqBJgmChIsdSFYGUKoVCQFFwkcVwKGWMkQSoeqSIOk7AqhOCQKFVCKiut-Zhc7v4qa5yzusrXtm7RbnMG-VBdvq-uJy92JKr2H_oLfwCjUGo3</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>M3GAT: A Multi-modal, Multi-task Interactive Graph Attention Network for Conversational Sentiment Analysis and Emotion Recognition</title><source>Access via ACM Digital Library</source><creator>Zhang, Yazhou ; Jia, Ao ; Wang, Bo ; Zhang, Peng ; Zhao, Dongming ; Li, Pu ; Hou, Yuexian ; Jin, Xiaojia ; Song, Dawei ; Qin, Jing</creator><creatorcontrib>Zhang, Yazhou ; Jia, Ao ; Wang, Bo ; Zhang, Peng ; Zhao, Dongming ; Li, Pu ; Hou, Yuexian ; Jin, Xiaojia ; Song, Dawei ; Qin, Jing</creatorcontrib><description>Sentiment and emotion, which correspond to long-term and short-lived human feelings, are closely linked to each other, leading to the fact that sentiment analysis and emotion recognition are also two interdependent tasks in natural language processing (NLP). One task often leverages the shared knowledge from another task and performs better when solved in a joint learning paradigm. Conversational context dependency, multi-modal interaction, and multi-task correlation are three key factors that contribute to this joint paradigm. However, none of the recent approaches have considered them in a unified framework. To fill this gap, we propose a multi-modal, multi-task interactive graph attention network, termed M3GAT, to simultaneously solve the three problems. At the heart of the model is a proposed interactive conversation graph layer containing three core sub-modules, which are: (1) local-global context connection for modeling both local and global conversational context, (2) cross-modal connection for learning multi-modal complementary and (3) cross-task connection for capturing the correlation across two tasks. Comprehensive experiments on three benchmarking datasets, MELD, MEISD, and MSED, show the effectiveness of M3GAT over state-of-the-art baselines with the margin of 1.88%, 5.37%, and 0.19% for sentiment analysis, and 1.99%, 3.65%, and 0.13% for emotion recognition, respectively. In addition, we also show the superiority of multi-task learning over the single-task framework.</description><identifier>ISSN: 1046-8188</identifier><identifier>EISSN: 1558-2868</identifier><identifier>DOI: 10.1145/3593583</identifier><language>eng</language><publisher>New York, NY: ACM</publisher><subject>Computing methodologies ; Knowledge representation and reasoning ; Natural language processing ; Network design principles ; Networks</subject><ispartof>ACM transactions on information systems, 2023-08, Vol.42 (1), p.1-32, Article 13</ispartof><rights>Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a240t-bf701a7c09adeb51d9025c07b35766d586a4808c34cb6272fb55307ccd098fee3</citedby><cites>FETCH-LOGICAL-a240t-bf701a7c09adeb51d9025c07b35766d586a4808c34cb6272fb55307ccd098fee3</cites><orcidid>0000-0002-5699-0176 ; 0000-0002-5703-9905 ; 0000-0002-0694-5799 ; 0000-0003-3327-2646 ; 0000-0002-8660-3608 ; 0000-0002-3238-493X ; 0009-0008-5405-3331 ; 0000-0003-4592-9545 ; 0000-0003-0228-9330 ; 0000-0002-7059-0929</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3593583$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>314,780,784,2282,27924,27925,40196,76228</link.rule.ids></links><search><creatorcontrib>Zhang, Yazhou</creatorcontrib><creatorcontrib>Jia, Ao</creatorcontrib><creatorcontrib>Wang, Bo</creatorcontrib><creatorcontrib>Zhang, Peng</creatorcontrib><creatorcontrib>Zhao, Dongming</creatorcontrib><creatorcontrib>Li, Pu</creatorcontrib><creatorcontrib>Hou, Yuexian</creatorcontrib><creatorcontrib>Jin, Xiaojia</creatorcontrib><creatorcontrib>Song, Dawei</creatorcontrib><creatorcontrib>Qin, Jing</creatorcontrib><title>M3GAT: A Multi-modal, Multi-task Interactive Graph Attention Network for Conversational Sentiment Analysis and Emotion Recognition</title><title>ACM transactions on information systems</title><addtitle>ACM TOIS</addtitle><description>Sentiment and emotion, which correspond to long-term and short-lived human feelings, are closely linked to each other, leading to the fact that sentiment analysis and emotion recognition are also two interdependent tasks in natural language processing (NLP). One task often leverages the shared knowledge from another task and performs better when solved in a joint learning paradigm. Conversational context dependency, multi-modal interaction, and multi-task correlation are three key factors that contribute to this joint paradigm. However, none of the recent approaches have considered them in a unified framework. To fill this gap, we propose a multi-modal, multi-task interactive graph attention network, termed M3GAT, to simultaneously solve the three problems. At the heart of the model is a proposed interactive conversation graph layer containing three core sub-modules, which are: (1) local-global context connection for modeling both local and global conversational context, (2) cross-modal connection for learning multi-modal complementary and (3) cross-task connection for capturing the correlation across two tasks. Comprehensive experiments on three benchmarking datasets, MELD, MEISD, and MSED, show the effectiveness of M3GAT over state-of-the-art baselines with the margin of 1.88%, 5.37%, and 0.19% for sentiment analysis, and 1.99%, 3.65%, and 0.13% for emotion recognition, respectively. In addition, we also show the superiority of multi-task learning over the single-task framework.</description><subject>Computing methodologies</subject><subject>Knowledge representation and reasoning</subject><subject>Natural language processing</subject><subject>Network design principles</subject><subject>Networks</subject><issn>1046-8188</issn><issn>1558-2868</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNo9kDFPwzAQhS0EEqUgdiZvLATOcZw4bFFVClILEpQ5ujgOhCZxZZuirvxyElpY7t7T--6GR8g5g2vGInHDRcqF5AdkxISQQShjedhriOJAMimPyYlzHwC9j2FEvhd8li1vaUYXn42vg9aU2FztjUe3og-d1xaVrzeaziyu32nmve58bTr6qP2XsStaGUsnptto63AIsKEvA9L2g2a93braUexKOm3N7-WzVuatqwd9So4qbJw-2-8xeb2bLif3wfxp9jDJ5gGGEfigqBJgmChIsdSFYGUKoVCQFFwkcVwKGWMkQSoeqSIOk7AqhOCQKFVCKiut-Zhc7v4qa5yzusrXtm7RbnMG-VBdvq-uJy92JKr2H_oLfwCjUGo3</recordid><startdate>20230821</startdate><enddate>20230821</enddate><creator>Zhang, Yazhou</creator><creator>Jia, Ao</creator><creator>Wang, Bo</creator><creator>Zhang, Peng</creator><creator>Zhao, Dongming</creator><creator>Li, Pu</creator><creator>Hou, Yuexian</creator><creator>Jin, Xiaojia</creator><creator>Song, Dawei</creator><creator>Qin, Jing</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-5699-0176</orcidid><orcidid>https://orcid.org/0000-0002-5703-9905</orcidid><orcidid>https://orcid.org/0000-0002-0694-5799</orcidid><orcidid>https://orcid.org/0000-0003-3327-2646</orcidid><orcidid>https://orcid.org/0000-0002-8660-3608</orcidid><orcidid>https://orcid.org/0000-0002-3238-493X</orcidid><orcidid>https://orcid.org/0009-0008-5405-3331</orcidid><orcidid>https://orcid.org/0000-0003-4592-9545</orcidid><orcidid>https://orcid.org/0000-0003-0228-9330</orcidid><orcidid>https://orcid.org/0000-0002-7059-0929</orcidid></search><sort><creationdate>20230821</creationdate><title>M3GAT: A Multi-modal, Multi-task Interactive Graph Attention Network for Conversational Sentiment Analysis and Emotion Recognition</title><author>Zhang, Yazhou ; Jia, Ao ; Wang, Bo ; Zhang, Peng ; Zhao, Dongming ; Li, Pu ; Hou, Yuexian ; Jin, Xiaojia ; Song, Dawei ; Qin, Jing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a240t-bf701a7c09adeb51d9025c07b35766d586a4808c34cb6272fb55307ccd098fee3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computing methodologies</topic><topic>Knowledge representation and reasoning</topic><topic>Natural language processing</topic><topic>Network design principles</topic><topic>Networks</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Yazhou</creatorcontrib><creatorcontrib>Jia, Ao</creatorcontrib><creatorcontrib>Wang, Bo</creatorcontrib><creatorcontrib>Zhang, Peng</creatorcontrib><creatorcontrib>Zhao, Dongming</creatorcontrib><creatorcontrib>Li, Pu</creatorcontrib><creatorcontrib>Hou, Yuexian</creatorcontrib><creatorcontrib>Jin, Xiaojia</creatorcontrib><creatorcontrib>Song, Dawei</creatorcontrib><creatorcontrib>Qin, Jing</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on information systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Yazhou</au><au>Jia, Ao</au><au>Wang, Bo</au><au>Zhang, Peng</au><au>Zhao, Dongming</au><au>Li, Pu</au><au>Hou, Yuexian</au><au>Jin, Xiaojia</au><au>Song, Dawei</au><au>Qin, Jing</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>M3GAT: A Multi-modal, Multi-task Interactive Graph Attention Network for Conversational Sentiment Analysis and Emotion Recognition</atitle><jtitle>ACM transactions on information systems</jtitle><stitle>ACM TOIS</stitle><date>2023-08-21</date><risdate>2023</risdate><volume>42</volume><issue>1</issue><spage>1</spage><epage>32</epage><pages>1-32</pages><artnum>13</artnum><issn>1046-8188</issn><eissn>1558-2868</eissn><abstract>Sentiment and emotion, which correspond to long-term and short-lived human feelings, are closely linked to each other, leading to the fact that sentiment analysis and emotion recognition are also two interdependent tasks in natural language processing (NLP). One task often leverages the shared knowledge from another task and performs better when solved in a joint learning paradigm. Conversational context dependency, multi-modal interaction, and multi-task correlation are three key factors that contribute to this joint paradigm. However, none of the recent approaches have considered them in a unified framework. To fill this gap, we propose a multi-modal, multi-task interactive graph attention network, termed M3GAT, to simultaneously solve the three problems. At the heart of the model is a proposed interactive conversation graph layer containing three core sub-modules, which are: (1) local-global context connection for modeling both local and global conversational context, (2) cross-modal connection for learning multi-modal complementary and (3) cross-task connection for capturing the correlation across two tasks. Comprehensive experiments on three benchmarking datasets, MELD, MEISD, and MSED, show the effectiveness of M3GAT over state-of-the-art baselines with the margin of 1.88%, 5.37%, and 0.19% for sentiment analysis, and 1.99%, 3.65%, and 0.13% for emotion recognition, respectively. In addition, we also show the superiority of multi-task learning over the single-task framework.</abstract><cop>New York, NY</cop><pub>ACM</pub><doi>10.1145/3593583</doi><tpages>32</tpages><orcidid>https://orcid.org/0000-0002-5699-0176</orcidid><orcidid>https://orcid.org/0000-0002-5703-9905</orcidid><orcidid>https://orcid.org/0000-0002-0694-5799</orcidid><orcidid>https://orcid.org/0000-0003-3327-2646</orcidid><orcidid>https://orcid.org/0000-0002-8660-3608</orcidid><orcidid>https://orcid.org/0000-0002-3238-493X</orcidid><orcidid>https://orcid.org/0009-0008-5405-3331</orcidid><orcidid>https://orcid.org/0000-0003-4592-9545</orcidid><orcidid>https://orcid.org/0000-0003-0228-9330</orcidid><orcidid>https://orcid.org/0000-0002-7059-0929</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1046-8188
ispartof	ACM transactions on information systems, 2023-08, Vol.42 (1), p.1-32, Article 13
issn	1046-8188 1558-2868
language	eng
recordid	cdi_crossref_primary_10_1145_3593583
source	Access via ACM Digital Library
subjects	Computing methodologies Knowledge representation and reasoning Natural language processing Network design principles Networks
title	M3GAT: A Multi-modal, Multi-task Interactive Graph Attention Network for Conversational Sentiment Analysis and Emotion Recognition
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T11%3A51%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=M3GAT:%20A%20Multi-modal,%20Multi-task%20Interactive%20Graph%20Attention%20Network%20for%20Conversational%20Sentiment%20Analysis%20and%20Emotion%20Recognition&rft.jtitle=ACM%20transactions%20on%20information%20systems&rft.au=Zhang,%20Yazhou&rft.date=2023-08-21&rft.volume=42&rft.issue=1&rft.spage=1&rft.epage=32&rft.pages=1-32&rft.artnum=13&rft.issn=1046-8188&rft.eissn=1558-2868&rft_id=info:doi/10.1145/3593583&rft_dat=%3Cacm_cross%3E3593583%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true