Long Dialogue Emotion Detection Based on Commonsense Knowledge Graph Guidance

Dialogue emotion detection is always challenging due to human subjectivity and the randomness of dialogue content. In a conversation, the emotion of each person often develops via a cumulative process, which can be influenced by many elements of uncertainty. Much commonsense knowledge influences peo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on multimedia 2024, Vol.26, p.514-528
Hauptverfasser: Nie, Weizhi, Bao, Yuru, Zhao, Yue, Liu, Anan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 528
container_issue
container_start_page 514
container_title IEEE transactions on multimedia
container_volume 26
creator Nie, Weizhi
Bao, Yuru
Zhao, Yue
Liu, Anan
description Dialogue emotion detection is always challenging due to human subjectivity and the randomness of dialogue content. In a conversation, the emotion of each person often develops via a cumulative process, which can be influenced by many elements of uncertainty. Much commonsense knowledge influences people's emotions imperceptibly, such as experiential or habitual knowledge. In the process of conversation, this commonsense knowledge information can be used to enrich the semantic information of each utterance and improve the accuracy of emotion recognition. In this paper, we propose a growing graph model for dialogues emotion detection based on retrieval of external knowledge atlas ATOMIC from local and global respectively, which can effectively represent the dialogues as a process variable in a sequence and the correlation among utterances also can be represented by the graph model. In particular, 1) we introduce a common sense knowledge graph for linking the commonsense knowledge retrieved from external knowledge atlas ATOMIC, which can effectively add auxiliary information to improve the performance of each utterance's representation. 2) We propose a novel self-supervised learning method for extracting the latent topic of each dialogue. Based on this design, we also propose an effective optimization mechanism to make the representation (embedding) of latent topic has a better distinction for the next operation. 3) Finally, the cross-attention module is utilized to combine the utterances' features and the latent conversation topic information. The attention mechanism can effectively use topic information to supplement the representation of utterances and improve recognition performance. The model is tested on three popular datasets in dialogue emotion detection and is empirically demonstrated to outperform the state-of-the-art approaches. Meanwhile, to demonstrate the performance of our approach, we also build a long dialogue dataset. The average length of each conversation is over 50 utterances. The final experimental results also demonstrate the superior performance of our approach.
doi_str_mv 10.1109/TMM.2023.3267295
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2912942990</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10102573</ieee_id><sourcerecordid>2912942990</sourcerecordid><originalsourceid>FETCH-LOGICAL-c292t-3c90648c8cac4f3aef9cfb2f99916857807ab12960f913369f028a1497f6e93e3</originalsourceid><addsrcrecordid>eNpNkDFPwzAQRi0EEqWwMzBEYk4520mcG6EtBdGKpcyW655LqiYucSLEv8elHZBOum943530GLvlMOIc8GG5WIwECDmSolAC8zM24JjxFECp85hzASkKDpfsKoQtAM9yUAO2mPtmk0wqs_ObnpJp7bvKN8mEOrJ_6ckEWicxjH1d-yZQnOSt8d87Wm8ombVm_5nM-mptGkvX7MKZXaCb0x6yj-fpcvySzt9nr-PHeWoFii6VFqHISltaYzMnDTm0biUcIvKizFUJyqy4wAIccikLdCBKwzNUriCUJIfs_nh33_qvnkKnt75vm_hSC4zFTCBCpOBI2daH0JLT-7aqTfujOeiDNB2l6YM0fZIWK3fHSkVE_3AOIldS_gKMJmcG</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2912942990</pqid></control><display><type>article</type><title>Long Dialogue Emotion Detection Based on Commonsense Knowledge Graph Guidance</title><source>IEEE Electronic Library (IEL)</source><creator>Nie, Weizhi ; Bao, Yuru ; Zhao, Yue ; Liu, Anan</creator><creatorcontrib>Nie, Weizhi ; Bao, Yuru ; Zhao, Yue ; Liu, Anan</creatorcontrib><description>Dialogue emotion detection is always challenging due to human subjectivity and the randomness of dialogue content. In a conversation, the emotion of each person often develops via a cumulative process, which can be influenced by many elements of uncertainty. Much commonsense knowledge influences people's emotions imperceptibly, such as experiential or habitual knowledge. In the process of conversation, this commonsense knowledge information can be used to enrich the semantic information of each utterance and improve the accuracy of emotion recognition. In this paper, we propose a growing graph model for dialogues emotion detection based on retrieval of external knowledge atlas ATOMIC from local and global respectively, which can effectively represent the dialogues as a process variable in a sequence and the correlation among utterances also can be represented by the graph model. In particular, 1) we introduce a common sense knowledge graph for linking the commonsense knowledge retrieved from external knowledge atlas ATOMIC, which can effectively add auxiliary information to improve the performance of each utterance's representation. 2) We propose a novel self-supervised learning method for extracting the latent topic of each dialogue. Based on this design, we also propose an effective optimization mechanism to make the representation (embedding) of latent topic has a better distinction for the next operation. 3) Finally, the cross-attention module is utilized to combine the utterances' features and the latent conversation topic information. The attention mechanism can effectively use topic information to supplement the representation of utterances and improve recognition performance. The model is tested on three popular datasets in dialogue emotion detection and is empirically demonstrated to outperform the state-of-the-art approaches. Meanwhile, to demonstrate the performance of our approach, we also build a long dialogue dataset. The average length of each conversation is over 50 utterances. The final experimental results also demonstrate the superior performance of our approach.</description><identifier>ISSN: 1520-9210</identifier><identifier>EISSN: 1941-0077</identifier><identifier>DOI: 10.1109/TMM.2023.3267295</identifier><identifier>CODEN: ITMUF8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>commonsense knowledge graph ; Commonsense reasoning ; Correlation ; Datasets ; Design optimization ; Emotion detection ; Emotion recognition ; Emotions ; Graphical representations ; growing graph ; Knowledge ; Knowledge representation ; Oral communication ; Performance enhancement ; Process variables ; Self-supervised learning ; Social networking (online) ; Speech recognition ; topic module ; Transformers</subject><ispartof>IEEE transactions on multimedia, 2024, Vol.26, p.514-528</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c292t-3c90648c8cac4f3aef9cfb2f99916857807ab12960f913369f028a1497f6e93e3</citedby><cites>FETCH-LOGICAL-c292t-3c90648c8cac4f3aef9cfb2f99916857807ab12960f913369f028a1497f6e93e3</cites><orcidid>0000-0002-0578-8138 ; 0000-0001-5755-9145 ; 0000-0001-8390-2410</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10102573$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,781,785,797,4025,27927,27928,27929,54762</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10102573$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Nie, Weizhi</creatorcontrib><creatorcontrib>Bao, Yuru</creatorcontrib><creatorcontrib>Zhao, Yue</creatorcontrib><creatorcontrib>Liu, Anan</creatorcontrib><title>Long Dialogue Emotion Detection Based on Commonsense Knowledge Graph Guidance</title><title>IEEE transactions on multimedia</title><addtitle>TMM</addtitle><description>Dialogue emotion detection is always challenging due to human subjectivity and the randomness of dialogue content. In a conversation, the emotion of each person often develops via a cumulative process, which can be influenced by many elements of uncertainty. Much commonsense knowledge influences people's emotions imperceptibly, such as experiential or habitual knowledge. In the process of conversation, this commonsense knowledge information can be used to enrich the semantic information of each utterance and improve the accuracy of emotion recognition. In this paper, we propose a growing graph model for dialogues emotion detection based on retrieval of external knowledge atlas ATOMIC from local and global respectively, which can effectively represent the dialogues as a process variable in a sequence and the correlation among utterances also can be represented by the graph model. In particular, 1) we introduce a common sense knowledge graph for linking the commonsense knowledge retrieved from external knowledge atlas ATOMIC, which can effectively add auxiliary information to improve the performance of each utterance's representation. 2) We propose a novel self-supervised learning method for extracting the latent topic of each dialogue. Based on this design, we also propose an effective optimization mechanism to make the representation (embedding) of latent topic has a better distinction for the next operation. 3) Finally, the cross-attention module is utilized to combine the utterances' features and the latent conversation topic information. The attention mechanism can effectively use topic information to supplement the representation of utterances and improve recognition performance. The model is tested on three popular datasets in dialogue emotion detection and is empirically demonstrated to outperform the state-of-the-art approaches. Meanwhile, to demonstrate the performance of our approach, we also build a long dialogue dataset. The average length of each conversation is over 50 utterances. The final experimental results also demonstrate the superior performance of our approach.</description><subject>commonsense knowledge graph</subject><subject>Commonsense reasoning</subject><subject>Correlation</subject><subject>Datasets</subject><subject>Design optimization</subject><subject>Emotion detection</subject><subject>Emotion recognition</subject><subject>Emotions</subject><subject>Graphical representations</subject><subject>growing graph</subject><subject>Knowledge</subject><subject>Knowledge representation</subject><subject>Oral communication</subject><subject>Performance enhancement</subject><subject>Process variables</subject><subject>Self-supervised learning</subject><subject>Social networking (online)</subject><subject>Speech recognition</subject><subject>topic module</subject><subject>Transformers</subject><issn>1520-9210</issn><issn>1941-0077</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkDFPwzAQRi0EEqWwMzBEYk4520mcG6EtBdGKpcyW655LqiYucSLEv8elHZBOum943530GLvlMOIc8GG5WIwECDmSolAC8zM24JjxFECp85hzASkKDpfsKoQtAM9yUAO2mPtmk0wqs_ObnpJp7bvKN8mEOrJ_6ckEWicxjH1d-yZQnOSt8d87Wm8ombVm_5nM-mptGkvX7MKZXaCb0x6yj-fpcvySzt9nr-PHeWoFii6VFqHISltaYzMnDTm0biUcIvKizFUJyqy4wAIccikLdCBKwzNUriCUJIfs_nh33_qvnkKnt75vm_hSC4zFTCBCpOBI2daH0JLT-7aqTfujOeiDNB2l6YM0fZIWK3fHSkVE_3AOIldS_gKMJmcG</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Nie, Weizhi</creator><creator>Bao, Yuru</creator><creator>Zhao, Yue</creator><creator>Liu, Anan</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-0578-8138</orcidid><orcidid>https://orcid.org/0000-0001-5755-9145</orcidid><orcidid>https://orcid.org/0000-0001-8390-2410</orcidid></search><sort><creationdate>2024</creationdate><title>Long Dialogue Emotion Detection Based on Commonsense Knowledge Graph Guidance</title><author>Nie, Weizhi ; Bao, Yuru ; Zhao, Yue ; Liu, Anan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c292t-3c90648c8cac4f3aef9cfb2f99916857807ab12960f913369f028a1497f6e93e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>commonsense knowledge graph</topic><topic>Commonsense reasoning</topic><topic>Correlation</topic><topic>Datasets</topic><topic>Design optimization</topic><topic>Emotion detection</topic><topic>Emotion recognition</topic><topic>Emotions</topic><topic>Graphical representations</topic><topic>growing graph</topic><topic>Knowledge</topic><topic>Knowledge representation</topic><topic>Oral communication</topic><topic>Performance enhancement</topic><topic>Process variables</topic><topic>Self-supervised learning</topic><topic>Social networking (online)</topic><topic>Speech recognition</topic><topic>topic module</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Nie, Weizhi</creatorcontrib><creatorcontrib>Bao, Yuru</creatorcontrib><creatorcontrib>Zhao, Yue</creatorcontrib><creatorcontrib>Liu, Anan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on multimedia</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Nie, Weizhi</au><au>Bao, Yuru</au><au>Zhao, Yue</au><au>Liu, Anan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Long Dialogue Emotion Detection Based on Commonsense Knowledge Graph Guidance</atitle><jtitle>IEEE transactions on multimedia</jtitle><stitle>TMM</stitle><date>2024</date><risdate>2024</risdate><volume>26</volume><spage>514</spage><epage>528</epage><pages>514-528</pages><issn>1520-9210</issn><eissn>1941-0077</eissn><coden>ITMUF8</coden><abstract>Dialogue emotion detection is always challenging due to human subjectivity and the randomness of dialogue content. In a conversation, the emotion of each person often develops via a cumulative process, which can be influenced by many elements of uncertainty. Much commonsense knowledge influences people's emotions imperceptibly, such as experiential or habitual knowledge. In the process of conversation, this commonsense knowledge information can be used to enrich the semantic information of each utterance and improve the accuracy of emotion recognition. In this paper, we propose a growing graph model for dialogues emotion detection based on retrieval of external knowledge atlas ATOMIC from local and global respectively, which can effectively represent the dialogues as a process variable in a sequence and the correlation among utterances also can be represented by the graph model. In particular, 1) we introduce a common sense knowledge graph for linking the commonsense knowledge retrieved from external knowledge atlas ATOMIC, which can effectively add auxiliary information to improve the performance of each utterance's representation. 2) We propose a novel self-supervised learning method for extracting the latent topic of each dialogue. Based on this design, we also propose an effective optimization mechanism to make the representation (embedding) of latent topic has a better distinction for the next operation. 3) Finally, the cross-attention module is utilized to combine the utterances' features and the latent conversation topic information. The attention mechanism can effectively use topic information to supplement the representation of utterances and improve recognition performance. The model is tested on three popular datasets in dialogue emotion detection and is empirically demonstrated to outperform the state-of-the-art approaches. Meanwhile, to demonstrate the performance of our approach, we also build a long dialogue dataset. The average length of each conversation is over 50 utterances. The final experimental results also demonstrate the superior performance of our approach.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TMM.2023.3267295</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0002-0578-8138</orcidid><orcidid>https://orcid.org/0000-0001-5755-9145</orcidid><orcidid>https://orcid.org/0000-0001-8390-2410</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1520-9210
ispartof IEEE transactions on multimedia, 2024, Vol.26, p.514-528
issn 1520-9210
1941-0077
language eng
recordid cdi_proquest_journals_2912942990
source IEEE Electronic Library (IEL)
subjects commonsense knowledge graph
Commonsense reasoning
Correlation
Datasets
Design optimization
Emotion detection
Emotion recognition
Emotions
Graphical representations
growing graph
Knowledge
Knowledge representation
Oral communication
Performance enhancement
Process variables
Self-supervised learning
Social networking (online)
Speech recognition
topic module
Transformers
title Long Dialogue Emotion Detection Based on Commonsense Knowledge Graph Guidance
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-17T03%3A58%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Long%20Dialogue%20Emotion%20Detection%20Based%20on%20Commonsense%20Knowledge%20Graph%20Guidance&rft.jtitle=IEEE%20transactions%20on%20multimedia&rft.au=Nie,%20Weizhi&rft.date=2024&rft.volume=26&rft.spage=514&rft.epage=528&rft.pages=514-528&rft.issn=1520-9210&rft.eissn=1941-0077&rft.coden=ITMUF8&rft_id=info:doi/10.1109/TMM.2023.3267295&rft_dat=%3Cproquest_RIE%3E2912942990%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2912942990&rft_id=info:pmid/&rft_ieee_id=10102573&rfr_iscdi=true