Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement Framework
Multimodal emotion recognition systems rely heavily on the full availability of modalities, suffering significant performance declines when modal data is incomplete. To tackle this issue, we present the Cross-Modal Alignment, Reconstruction, and Refinement (CM-ARR) framework, an innovative approach...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Sun, Haoqin Zhao, Shiwan Li, Shaokai Kong, Xiangyu Wang, Xuechen Kong, Aobo Zhou, Jiaming Chen, Yong Zeng, Wenjia Qin, Yong |
description | Multimodal emotion recognition systems rely heavily on the full availability
of modalities, suffering significant performance declines when modal data is
incomplete. To tackle this issue, we present the Cross-Modal Alignment,
Reconstruction, and Refinement (CM-ARR) framework, an innovative approach that
sequentially engages in cross-modal alignment, reconstruction, and refinement
phases to handle missing modalities and enhance emotion recognition. This
framework utilizes unsupervised distribution-based contrastive learning to
align heterogeneous modal distributions, reducing discrepancies and modeling
semantic uncertainty effectively. The reconstruction phase applies normalizing
flow models to transform these aligned distributions and recover missing
modalities. The refinement phase employs supervised point-based contrastive
learning to disrupt semantic correlations and accentuate emotional traits,
thereby enriching the affective content of the reconstructed representations.
Extensive experiments on the IEMOCAP and MSP-IMPROV datasets confirm the
superior performance of CM-ARR under conditions of both missing and complete
modalities. Notably, averaged across six scenarios of missing modalities,
CM-ARR achieves absolute improvements of 2.11% in WAR and 2.12% in UAR on the
IEMOCAP dataset, and 1.71% and 1.96% in WAR and UAR, respectively, on the
MSP-IMPROV dataset. |
doi_str_mv | 10.48550/arxiv.2407.09029 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2407_09029</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2407_09029</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2407_090293</originalsourceid><addsrcrecordid>eNqFjjsOwjAQRN1QIOAAVOwBSDABxKeLQhAUUCD6aGVMsLDXyDYBbg-J6KlmNDO7eoz1xzyeLmYzPkL3UlWcTPk85kueLNvsndMVSSgqITc2KEtwlMKWpBqvCHYkrLlrGSSsMeAKUjjYSmrInPU-2tszaki1KslICsPmnHxwD1F_GALS-ZtdFMm6h41DI5_W3bqsdUHtZe-nHTbY5KdsGzWQxd0pg-5d1LBFAzv5v_gAwpFLYg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement Framework</title><source>arXiv.org</source><creator>Sun, Haoqin ; Zhao, Shiwan ; Li, Shaokai ; Kong, Xiangyu ; Wang, Xuechen ; Kong, Aobo ; Zhou, Jiaming ; Chen, Yong ; Zeng, Wenjia ; Qin, Yong</creator><creatorcontrib>Sun, Haoqin ; Zhao, Shiwan ; Li, Shaokai ; Kong, Xiangyu ; Wang, Xuechen ; Kong, Aobo ; Zhou, Jiaming ; Chen, Yong ; Zeng, Wenjia ; Qin, Yong</creatorcontrib><description>Multimodal emotion recognition systems rely heavily on the full availability
of modalities, suffering significant performance declines when modal data is
incomplete. To tackle this issue, we present the Cross-Modal Alignment,
Reconstruction, and Refinement (CM-ARR) framework, an innovative approach that
sequentially engages in cross-modal alignment, reconstruction, and refinement
phases to handle missing modalities and enhance emotion recognition. This
framework utilizes unsupervised distribution-based contrastive learning to
align heterogeneous modal distributions, reducing discrepancies and modeling
semantic uncertainty effectively. The reconstruction phase applies normalizing
flow models to transform these aligned distributions and recover missing
modalities. The refinement phase employs supervised point-based contrastive
learning to disrupt semantic correlations and accentuate emotional traits,
thereby enriching the affective content of the reconstructed representations.
Extensive experiments on the IEMOCAP and MSP-IMPROV datasets confirm the
superior performance of CM-ARR under conditions of both missing and complete
modalities. Notably, averaged across six scenarios of missing modalities,
CM-ARR achieves absolute improvements of 2.11% in WAR and 2.12% in UAR on the
IEMOCAP dataset, and 1.71% and 1.96% in WAR and UAR, respectively, on the
MSP-IMPROV dataset.</description><identifier>DOI: 10.48550/arxiv.2407.09029</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Multimedia ; Computer Science - Sound</subject><creationdate>2024-07</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2407.09029$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2407.09029$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Sun, Haoqin</creatorcontrib><creatorcontrib>Zhao, Shiwan</creatorcontrib><creatorcontrib>Li, Shaokai</creatorcontrib><creatorcontrib>Kong, Xiangyu</creatorcontrib><creatorcontrib>Wang, Xuechen</creatorcontrib><creatorcontrib>Kong, Aobo</creatorcontrib><creatorcontrib>Zhou, Jiaming</creatorcontrib><creatorcontrib>Chen, Yong</creatorcontrib><creatorcontrib>Zeng, Wenjia</creatorcontrib><creatorcontrib>Qin, Yong</creatorcontrib><title>Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement Framework</title><description>Multimodal emotion recognition systems rely heavily on the full availability
of modalities, suffering significant performance declines when modal data is
incomplete. To tackle this issue, we present the Cross-Modal Alignment,
Reconstruction, and Refinement (CM-ARR) framework, an innovative approach that
sequentially engages in cross-modal alignment, reconstruction, and refinement
phases to handle missing modalities and enhance emotion recognition. This
framework utilizes unsupervised distribution-based contrastive learning to
align heterogeneous modal distributions, reducing discrepancies and modeling
semantic uncertainty effectively. The reconstruction phase applies normalizing
flow models to transform these aligned distributions and recover missing
modalities. The refinement phase employs supervised point-based contrastive
learning to disrupt semantic correlations and accentuate emotional traits,
thereby enriching the affective content of the reconstructed representations.
Extensive experiments on the IEMOCAP and MSP-IMPROV datasets confirm the
superior performance of CM-ARR under conditions of both missing and complete
modalities. Notably, averaged across six scenarios of missing modalities,
CM-ARR achieves absolute improvements of 2.11% in WAR and 2.12% in UAR on the
IEMOCAP dataset, and 1.71% and 1.96% in WAR and UAR, respectively, on the
MSP-IMPROV dataset.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Multimedia</subject><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjjsOwjAQRN1QIOAAVOwBSDABxKeLQhAUUCD6aGVMsLDXyDYBbg-J6KlmNDO7eoz1xzyeLmYzPkL3UlWcTPk85kueLNvsndMVSSgqITc2KEtwlMKWpBqvCHYkrLlrGSSsMeAKUjjYSmrInPU-2tszaki1KslICsPmnHxwD1F_GALS-ZtdFMm6h41DI5_W3bqsdUHtZe-nHTbY5KdsGzWQxd0pg-5d1LBFAzv5v_gAwpFLYg</recordid><startdate>20240712</startdate><enddate>20240712</enddate><creator>Sun, Haoqin</creator><creator>Zhao, Shiwan</creator><creator>Li, Shaokai</creator><creator>Kong, Xiangyu</creator><creator>Wang, Xuechen</creator><creator>Kong, Aobo</creator><creator>Zhou, Jiaming</creator><creator>Chen, Yong</creator><creator>Zeng, Wenjia</creator><creator>Qin, Yong</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240712</creationdate><title>Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement Framework</title><author>Sun, Haoqin ; Zhao, Shiwan ; Li, Shaokai ; Kong, Xiangyu ; Wang, Xuechen ; Kong, Aobo ; Zhou, Jiaming ; Chen, Yong ; Zeng, Wenjia ; Qin, Yong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2407_090293</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Multimedia</topic><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Sun, Haoqin</creatorcontrib><creatorcontrib>Zhao, Shiwan</creatorcontrib><creatorcontrib>Li, Shaokai</creatorcontrib><creatorcontrib>Kong, Xiangyu</creatorcontrib><creatorcontrib>Wang, Xuechen</creatorcontrib><creatorcontrib>Kong, Aobo</creatorcontrib><creatorcontrib>Zhou, Jiaming</creatorcontrib><creatorcontrib>Chen, Yong</creatorcontrib><creatorcontrib>Zeng, Wenjia</creatorcontrib><creatorcontrib>Qin, Yong</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Sun, Haoqin</au><au>Zhao, Shiwan</au><au>Li, Shaokai</au><au>Kong, Xiangyu</au><au>Wang, Xuechen</au><au>Kong, Aobo</au><au>Zhou, Jiaming</au><au>Chen, Yong</au><au>Zeng, Wenjia</au><au>Qin, Yong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement Framework</atitle><date>2024-07-12</date><risdate>2024</risdate><abstract>Multimodal emotion recognition systems rely heavily on the full availability
of modalities, suffering significant performance declines when modal data is
incomplete. To tackle this issue, we present the Cross-Modal Alignment,
Reconstruction, and Refinement (CM-ARR) framework, an innovative approach that
sequentially engages in cross-modal alignment, reconstruction, and refinement
phases to handle missing modalities and enhance emotion recognition. This
framework utilizes unsupervised distribution-based contrastive learning to
align heterogeneous modal distributions, reducing discrepancies and modeling
semantic uncertainty effectively. The reconstruction phase applies normalizing
flow models to transform these aligned distributions and recover missing
modalities. The refinement phase employs supervised point-based contrastive
learning to disrupt semantic correlations and accentuate emotional traits,
thereby enriching the affective content of the reconstructed representations.
Extensive experiments on the IEMOCAP and MSP-IMPROV datasets confirm the
superior performance of CM-ARR under conditions of both missing and complete
modalities. Notably, averaged across six scenarios of missing modalities,
CM-ARR achieves absolute improvements of 2.11% in WAR and 2.12% in UAR on the
IEMOCAP dataset, and 1.71% and 1.96% in WAR and UAR, respectively, on the
MSP-IMPROV dataset.</abstract><doi>10.48550/arxiv.2407.09029</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2407.09029 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2407_09029 |
source | arXiv.org |
subjects | Computer Science - Computer Vision and Pattern Recognition Computer Science - Multimedia Computer Science - Sound |
title | Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement Framework |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T11%3A10%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Enhancing%20Emotion%20Recognition%20in%20Incomplete%20Data:%20A%20Novel%20Cross-Modal%20Alignment,%20Reconstruction,%20and%20Refinement%20Framework&rft.au=Sun,%20Haoqin&rft.date=2024-07-12&rft_id=info:doi/10.48550/arxiv.2407.09029&rft_dat=%3Carxiv_GOX%3E2407_09029%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |