A novel transformer autoencoder for multi-modal emotion recognition with incomplete data

Multi-modal signals have become essential data for emotion recognition since they can represent emotions more comprehensively. However, in real-world environments, it is often impossible to acquire complete data on multi-modal signals, and the problem of missing modalities causes severe performance...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neural networks 2024-04, Vol.172, p.106111-106111, Article 106111
Hauptverfasser: Cheng, Cheng, Liu, Wenzhe, Fan, Zhaoxin, Feng, Lin, Jia, Ziyu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 106111
container_issue
container_start_page 106111
container_title Neural networks
container_volume 172
creator Cheng, Cheng
Liu, Wenzhe
Fan, Zhaoxin
Feng, Lin
Jia, Ziyu
description Multi-modal signals have become essential data for emotion recognition since they can represent emotions more comprehensively. However, in real-world environments, it is often impossible to acquire complete data on multi-modal signals, and the problem of missing modalities causes severe performance degradation in emotion recognition. Therefore, this paper represents the first attempt to use a transformer-based architecture, aiming to fill the modality-incomplete data from partially observed data for multi-modal emotion recognition (MER). Concretely, this paper proposes a novel unified model called transformer autoencoder (TAE), comprising a modality-specific hybrid transformer encoder, an inter-modality transformer encoder, and a convolutional decoder. The modality-specific hybrid transformer encoder bridges a convolutional encoder and a transformer encoder, allowing the encoder to learn local and global context information within each particular modality. The inter-modality transformer encoder builds and aligns global cross-modal correlations and models long-range contextual information with different modalities. The convolutional decoder decodes the encoding features to produce more precise recognition. Besides, a regularization term is introduced into the convolutional decoder to force the decoder to fully leverage the complete and incomplete data for emotional recognition of missing data. 96.33%, 95.64%, and 92.69% accuracies are attained on the available data of the DEAP and SEED-IV datasets, and 93.25%, 92.23%, and 81.76% accuracies are obtained on the missing data. Particularly, the model acquires a 5.61% advantage with 70% missing data, demonstrating that the model outperforms some state-of-the-art approaches in incomplete multi-modal learning.
doi_str_mv 10.1016/j.neunet.2024.106111
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2929058734</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S089360802400025X</els_id><sourcerecordid>2929058734</sourcerecordid><originalsourceid>FETCH-LOGICAL-c362t-9f26d6e20eb6522d48a669b50825a46b1c536ce6b06dce77b3d0c9cf5633e8203</originalsourceid><addsrcrecordid>eNp9kEtLxTAQhYMoen38A5Eu3fSaV9N0I4j4AsGNgruQJlPNpW2uSar474326tLVHA7nzDAfQscELwkm4my1HGEaIS0ppjxbghCyhRZE1k1Ja0m30QLLhpUCS7yH9mNcYYyF5GwX7TFJWc05X6Dni2L079AXKegxdj4MEAo9JQ-j8TbrbBXD1CdXDt7qvoDBJ-fHIoDxL6P70R8uvRYuF4Z1DwkKq5M-RDud7iMcbeYBerq-ery8Le8fbu4uL-5LwwRNZdNRYQVQDK2oKLVcaiGatsKSVpqLlpiKCQOixcIaqOuWWWwa01WCMZAUswN0Ou9dB_82QUxqcNFA3-sR_BQVbWiDK1kznqN8jprgYwzQqXVwgw6fimD1zVSt1MxUfTNVM9NcO9lcmNoB7F_pF2IOnM8ByH--OwgqGpf5gXWZUlLWu_8vfAFOA4q6</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2929058734</pqid></control><display><type>article</type><title>A novel transformer autoencoder for multi-modal emotion recognition with incomplete data</title><source>MEDLINE</source><source>Access via ScienceDirect (Elsevier)</source><creator>Cheng, Cheng ; Liu, Wenzhe ; Fan, Zhaoxin ; Feng, Lin ; Jia, Ziyu</creator><creatorcontrib>Cheng, Cheng ; Liu, Wenzhe ; Fan, Zhaoxin ; Feng, Lin ; Jia, Ziyu</creatorcontrib><description>Multi-modal signals have become essential data for emotion recognition since they can represent emotions more comprehensively. However, in real-world environments, it is often impossible to acquire complete data on multi-modal signals, and the problem of missing modalities causes severe performance degradation in emotion recognition. Therefore, this paper represents the first attempt to use a transformer-based architecture, aiming to fill the modality-incomplete data from partially observed data for multi-modal emotion recognition (MER). Concretely, this paper proposes a novel unified model called transformer autoencoder (TAE), comprising a modality-specific hybrid transformer encoder, an inter-modality transformer encoder, and a convolutional decoder. The modality-specific hybrid transformer encoder bridges a convolutional encoder and a transformer encoder, allowing the encoder to learn local and global context information within each particular modality. The inter-modality transformer encoder builds and aligns global cross-modal correlations and models long-range contextual information with different modalities. The convolutional decoder decodes the encoding features to produce more precise recognition. Besides, a regularization term is introduced into the convolutional decoder to force the decoder to fully leverage the complete and incomplete data for emotional recognition of missing data. 96.33%, 95.64%, and 92.69% accuracies are attained on the available data of the DEAP and SEED-IV datasets, and 93.25%, 92.23%, and 81.76% accuracies are obtained on the missing data. Particularly, the model acquires a 5.61% advantage with 70% missing data, demonstrating that the model outperforms some state-of-the-art approaches in incomplete multi-modal learning.</description><identifier>ISSN: 0893-6080</identifier><identifier>EISSN: 1879-2782</identifier><identifier>DOI: 10.1016/j.neunet.2024.106111</identifier><identifier>PMID: 38237444</identifier><language>eng</language><publisher>United States: Elsevier Ltd</publisher><subject>Convolutional encoder ; Emotion recognition ; Emotions ; Incomplete data ; Learning ; Multi-modal signals ; Recognition, Psychology ; Transformer autoencoder</subject><ispartof>Neural networks, 2024-04, Vol.172, p.106111-106111, Article 106111</ispartof><rights>2024 Elsevier Ltd</rights><rights>Copyright © 2024 Elsevier Ltd. All rights reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c362t-9f26d6e20eb6522d48a669b50825a46b1c536ce6b06dce77b3d0c9cf5633e8203</citedby><cites>FETCH-LOGICAL-c362t-9f26d6e20eb6522d48a669b50825a46b1c536ce6b06dce77b3d0c9cf5633e8203</cites><orcidid>0000-0002-2138-6286</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.neunet.2024.106111$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38237444$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Cheng, Cheng</creatorcontrib><creatorcontrib>Liu, Wenzhe</creatorcontrib><creatorcontrib>Fan, Zhaoxin</creatorcontrib><creatorcontrib>Feng, Lin</creatorcontrib><creatorcontrib>Jia, Ziyu</creatorcontrib><title>A novel transformer autoencoder for multi-modal emotion recognition with incomplete data</title><title>Neural networks</title><addtitle>Neural Netw</addtitle><description>Multi-modal signals have become essential data for emotion recognition since they can represent emotions more comprehensively. However, in real-world environments, it is often impossible to acquire complete data on multi-modal signals, and the problem of missing modalities causes severe performance degradation in emotion recognition. Therefore, this paper represents the first attempt to use a transformer-based architecture, aiming to fill the modality-incomplete data from partially observed data for multi-modal emotion recognition (MER). Concretely, this paper proposes a novel unified model called transformer autoencoder (TAE), comprising a modality-specific hybrid transformer encoder, an inter-modality transformer encoder, and a convolutional decoder. The modality-specific hybrid transformer encoder bridges a convolutional encoder and a transformer encoder, allowing the encoder to learn local and global context information within each particular modality. The inter-modality transformer encoder builds and aligns global cross-modal correlations and models long-range contextual information with different modalities. The convolutional decoder decodes the encoding features to produce more precise recognition. Besides, a regularization term is introduced into the convolutional decoder to force the decoder to fully leverage the complete and incomplete data for emotional recognition of missing data. 96.33%, 95.64%, and 92.69% accuracies are attained on the available data of the DEAP and SEED-IV datasets, and 93.25%, 92.23%, and 81.76% accuracies are obtained on the missing data. Particularly, the model acquires a 5.61% advantage with 70% missing data, demonstrating that the model outperforms some state-of-the-art approaches in incomplete multi-modal learning.</description><subject>Convolutional encoder</subject><subject>Emotion recognition</subject><subject>Emotions</subject><subject>Incomplete data</subject><subject>Learning</subject><subject>Multi-modal signals</subject><subject>Recognition, Psychology</subject><subject>Transformer autoencoder</subject><issn>0893-6080</issn><issn>1879-2782</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kEtLxTAQhYMoen38A5Eu3fSaV9N0I4j4AsGNgruQJlPNpW2uSar474326tLVHA7nzDAfQscELwkm4my1HGEaIS0ppjxbghCyhRZE1k1Ja0m30QLLhpUCS7yH9mNcYYyF5GwX7TFJWc05X6Dni2L079AXKegxdj4MEAo9JQ-j8TbrbBXD1CdXDt7qvoDBJ-fHIoDxL6P70R8uvRYuF4Z1DwkKq5M-RDud7iMcbeYBerq-ery8Le8fbu4uL-5LwwRNZdNRYQVQDK2oKLVcaiGatsKSVpqLlpiKCQOixcIaqOuWWWwa01WCMZAUswN0Ou9dB_82QUxqcNFA3-sR_BQVbWiDK1kznqN8jprgYwzQqXVwgw6fimD1zVSt1MxUfTNVM9NcO9lcmNoB7F_pF2IOnM8ByH--OwgqGpf5gXWZUlLWu_8vfAFOA4q6</recordid><startdate>202404</startdate><enddate>202404</enddate><creator>Cheng, Cheng</creator><creator>Liu, Wenzhe</creator><creator>Fan, Zhaoxin</creator><creator>Feng, Lin</creator><creator>Jia, Ziyu</creator><general>Elsevier Ltd</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-2138-6286</orcidid></search><sort><creationdate>202404</creationdate><title>A novel transformer autoencoder for multi-modal emotion recognition with incomplete data</title><author>Cheng, Cheng ; Liu, Wenzhe ; Fan, Zhaoxin ; Feng, Lin ; Jia, Ziyu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c362t-9f26d6e20eb6522d48a669b50825a46b1c536ce6b06dce77b3d0c9cf5633e8203</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Convolutional encoder</topic><topic>Emotion recognition</topic><topic>Emotions</topic><topic>Incomplete data</topic><topic>Learning</topic><topic>Multi-modal signals</topic><topic>Recognition, Psychology</topic><topic>Transformer autoencoder</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cheng, Cheng</creatorcontrib><creatorcontrib>Liu, Wenzhe</creatorcontrib><creatorcontrib>Fan, Zhaoxin</creatorcontrib><creatorcontrib>Feng, Lin</creatorcontrib><creatorcontrib>Jia, Ziyu</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Neural networks</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cheng, Cheng</au><au>Liu, Wenzhe</au><au>Fan, Zhaoxin</au><au>Feng, Lin</au><au>Jia, Ziyu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A novel transformer autoencoder for multi-modal emotion recognition with incomplete data</atitle><jtitle>Neural networks</jtitle><addtitle>Neural Netw</addtitle><date>2024-04</date><risdate>2024</risdate><volume>172</volume><spage>106111</spage><epage>106111</epage><pages>106111-106111</pages><artnum>106111</artnum><issn>0893-6080</issn><eissn>1879-2782</eissn><abstract>Multi-modal signals have become essential data for emotion recognition since they can represent emotions more comprehensively. However, in real-world environments, it is often impossible to acquire complete data on multi-modal signals, and the problem of missing modalities causes severe performance degradation in emotion recognition. Therefore, this paper represents the first attempt to use a transformer-based architecture, aiming to fill the modality-incomplete data from partially observed data for multi-modal emotion recognition (MER). Concretely, this paper proposes a novel unified model called transformer autoencoder (TAE), comprising a modality-specific hybrid transformer encoder, an inter-modality transformer encoder, and a convolutional decoder. The modality-specific hybrid transformer encoder bridges a convolutional encoder and a transformer encoder, allowing the encoder to learn local and global context information within each particular modality. The inter-modality transformer encoder builds and aligns global cross-modal correlations and models long-range contextual information with different modalities. The convolutional decoder decodes the encoding features to produce more precise recognition. Besides, a regularization term is introduced into the convolutional decoder to force the decoder to fully leverage the complete and incomplete data for emotional recognition of missing data. 96.33%, 95.64%, and 92.69% accuracies are attained on the available data of the DEAP and SEED-IV datasets, and 93.25%, 92.23%, and 81.76% accuracies are obtained on the missing data. Particularly, the model acquires a 5.61% advantage with 70% missing data, demonstrating that the model outperforms some state-of-the-art approaches in incomplete multi-modal learning.</abstract><cop>United States</cop><pub>Elsevier Ltd</pub><pmid>38237444</pmid><doi>10.1016/j.neunet.2024.106111</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-2138-6286</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0893-6080
ispartof Neural networks, 2024-04, Vol.172, p.106111-106111, Article 106111
issn 0893-6080
1879-2782
language eng
recordid cdi_proquest_miscellaneous_2929058734
source MEDLINE; Access via ScienceDirect (Elsevier)
subjects Convolutional encoder
Emotion recognition
Emotions
Incomplete data
Learning
Multi-modal signals
Recognition, Psychology
Transformer autoencoder
title A novel transformer autoencoder for multi-modal emotion recognition with incomplete data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T00%3A25%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20novel%20transformer%20autoencoder%20for%20multi-modal%20emotion%20recognition%20with%20incomplete%20data&rft.jtitle=Neural%20networks&rft.au=Cheng,%20Cheng&rft.date=2024-04&rft.volume=172&rft.spage=106111&rft.epage=106111&rft.pages=106111-106111&rft.artnum=106111&rft.issn=0893-6080&rft.eissn=1879-2782&rft_id=info:doi/10.1016/j.neunet.2024.106111&rft_dat=%3Cproquest_cross%3E2929058734%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2929058734&rft_id=info:pmid/38237444&rft_els_id=S089360802400025X&rfr_iscdi=true