A Rearrangement and Restore-Based Mixer Model for Target-Oriented Multimodal Sentiment Classification

With the development of fine-grained multimodal sentiment analysis tasks, target-oriented multimodal sentiment (TMSC) analysis has received more attention, which aims to classify the sentiment of target with the help of textual and associated image features. Existing methods focus on exploring fine-...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on artificial intelligence 2024-06, Vol.5 (6), p.3109-3119
Hauptverfasser:	Jia, Li, Ma, Tinghuai, Rong, Huan, Sheng, Victor S., Huang, Xuejian, Xie, Xintong
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial intelligence Feature extraction Feature mixing Image restoration Mixers MLPs-based rearrangement and restore operations target-oriented multimodal sentiment classification Task analysis Transformers Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	3119
container_issue	6
container_start_page	3109
container_title	IEEE transactions on artificial intelligence
container_volume	5
creator	Jia, Li Ma, Tinghuai Rong, Huan Sheng, Victor S. Huang, Xuejian Xie, Xintong
description	With the development of fine-grained multimodal sentiment analysis tasks, target-oriented multimodal sentiment (TMSC) analysis has received more attention, which aims to classify the sentiment of target with the help of textual and associated image features. Existing methods focus on exploring fine-grained image features and incorporate transformer-based complex fusion strategies, while ignoring the heavy computational burden. Recently, some lightweight multilayer perceptrons (MLP)-based methods have been successfully applied to multimodal sentiment classification tasks. In this article, we propose an effective rearrangement and restore mixer model (RR-Mixer) for TMSC, which dedicates the interaction of image, text, and targets along the modal-axis , sequential-axis , and feature channel-axis through rearrangement and restore operations. Specifically, we take vision transformer (ViT) and robustly optimized BERT (RoBERTa) pretrained models to extract image and textual features, respectively. Further, we adopt cosine similarity to select the most semantically relevant image features. Then, an RR-Mixer module is designed for mixed multimodal features, with the core technology consisting of rolling, grouping rearrangement and restore operations. Moreover, we introduce MLP Unit to learn the information of different modalities for intermodal interaction. The results show that our model achieves superior performance on two benchmark multimodal datasets, TWITTER-15 and TWITTER-17, with a significant improvement of 4.66% and 1.26% in terms of macro-F1.
doi_str_mv	10.1109/TAI.2023.3341879
format	Article
fullrecord	<record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_ieee_primary_10354512</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10354512</ieee_id><sourcerecordid>10_1109_TAI_2023_3341879</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1092-c3ec5d68c949ab9ce3e3879ec3a1b8ab29a873f2b0a4d99da825a378cf0c16543</originalsourceid><addsrcrecordid>eNpNkEtrwzAQhEVpoSHNvYce9Afs6mHH0jENfQQSAq17NmtpHVQcq0gutP--yuOQ0y67MwPzEXLPWc4504_1YpULJmQuZcFVpa_IRMw1z4pS8euL_ZbMYvxijImSCyGqCcEFfUcIAYYd7nEYKQw2XeLoA2ZPENHSjfvFQDfeYk87H2gNYYdjtg0u6Q__n350e2-hpx_p4o4xyx5idJ0zMDo_3JGbDvqIs_Ocks-X53r5lq23r6vlYp2Z1EJkRqIp7VwZXWhotUGJMrVBI4G3ClqhQVWyEy2DwmptQYkSZKVMxwyfl4WcEnbKNcHHGLBrvoPbQ_hrOGsOpJpEqjmQas6kkuXhZHGIeCGXZZEYyX_snmXn</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A Rearrangement and Restore-Based Mixer Model for Target-Oriented Multimodal Sentiment Classification</title><source>IEEE Electronic Library (IEL)</source><creator>Jia, Li ; Ma, Tinghuai ; Rong, Huan ; Sheng, Victor S. ; Huang, Xuejian ; Xie, Xintong</creator><creatorcontrib>Jia, Li ; Ma, Tinghuai ; Rong, Huan ; Sheng, Victor S. ; Huang, Xuejian ; Xie, Xintong</creatorcontrib><description>With the development of fine-grained multimodal sentiment analysis tasks, target-oriented multimodal sentiment (TMSC) analysis has received more attention, which aims to classify the sentiment of target with the help of textual and associated image features. Existing methods focus on exploring fine-grained image features and incorporate transformer-based complex fusion strategies, while ignoring the heavy computational burden. Recently, some lightweight multilayer perceptrons (MLP)-based methods have been successfully applied to multimodal sentiment classification tasks. In this article, we propose an effective rearrangement and restore mixer model (RR-Mixer) for TMSC, which dedicates the interaction of image, text, and targets along the modal-axis , sequential-axis , and feature channel-axis through rearrangement and restore operations. Specifically, we take vision transformer (ViT) and robustly optimized BERT (RoBERTa) pretrained models to extract image and textual features, respectively. Further, we adopt cosine similarity to select the most semantically relevant image features. Then, an RR-Mixer module is designed for mixed multimodal features, with the core technology consisting of rolling, grouping rearrangement and restore operations. Moreover, we introduce MLP Unit to learn the information of different modalities for intermodal interaction. The results show that our model achieves superior performance on two benchmark multimodal datasets, TWITTER-15 and TWITTER-17, with a significant improvement of 4.66% and 1.26% in terms of macro-F1.</description><identifier>ISSN: 2691-4581</identifier><identifier>EISSN: 2691-4581</identifier><identifier>DOI: 10.1109/TAI.2023.3341879</identifier><identifier>CODEN: ITAICB</identifier><language>eng</language><publisher>IEEE</publisher><subject>Artificial intelligence ; Feature extraction ; Feature mixing ; Image restoration ; Mixers ; MLPs-based ; rearrangement and restore operations ; target-oriented multimodal sentiment classification ; Task analysis ; Transformers ; Visualization</subject><ispartof>IEEE transactions on artificial intelligence, 2024-06, Vol.5 (6), p.3109-3119</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c1092-c3ec5d68c949ab9ce3e3879ec3a1b8ab29a873f2b0a4d99da825a378cf0c16543</citedby><cites>FETCH-LOGICAL-c1092-c3ec5d68c949ab9ce3e3879ec3a1b8ab29a873f2b0a4d99da825a378cf0c16543</cites><orcidid>0000-0003-2718-1203 ; 0009-0003-8747-0872 ; 0000-0003-2320-1692 ; 0000-0003-0790-1779 ; 0000-0003-0542-1827 ; 0000-0003-4960-174X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10354512$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,782,786,798,27931,27932,54765</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10354512$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Jia, Li</creatorcontrib><creatorcontrib>Ma, Tinghuai</creatorcontrib><creatorcontrib>Rong, Huan</creatorcontrib><creatorcontrib>Sheng, Victor S.</creatorcontrib><creatorcontrib>Huang, Xuejian</creatorcontrib><creatorcontrib>Xie, Xintong</creatorcontrib><title>A Rearrangement and Restore-Based Mixer Model for Target-Oriented Multimodal Sentiment Classification</title><title>IEEE transactions on artificial intelligence</title><addtitle>TAI</addtitle><description>With the development of fine-grained multimodal sentiment analysis tasks, target-oriented multimodal sentiment (TMSC) analysis has received more attention, which aims to classify the sentiment of target with the help of textual and associated image features. Existing methods focus on exploring fine-grained image features and incorporate transformer-based complex fusion strategies, while ignoring the heavy computational burden. Recently, some lightweight multilayer perceptrons (MLP)-based methods have been successfully applied to multimodal sentiment classification tasks. In this article, we propose an effective rearrangement and restore mixer model (RR-Mixer) for TMSC, which dedicates the interaction of image, text, and targets along the modal-axis , sequential-axis , and feature channel-axis through rearrangement and restore operations. Specifically, we take vision transformer (ViT) and robustly optimized BERT (RoBERTa) pretrained models to extract image and textual features, respectively. Further, we adopt cosine similarity to select the most semantically relevant image features. Then, an RR-Mixer module is designed for mixed multimodal features, with the core technology consisting of rolling, grouping rearrangement and restore operations. Moreover, we introduce MLP Unit to learn the information of different modalities for intermodal interaction. The results show that our model achieves superior performance on two benchmark multimodal datasets, TWITTER-15 and TWITTER-17, with a significant improvement of 4.66% and 1.26% in terms of macro-F1.</description><subject>Artificial intelligence</subject><subject>Feature extraction</subject><subject>Feature mixing</subject><subject>Image restoration</subject><subject>Mixers</subject><subject>MLPs-based</subject><subject>rearrangement and restore operations</subject><subject>target-oriented multimodal sentiment classification</subject><subject>Task analysis</subject><subject>Transformers</subject><subject>Visualization</subject><issn>2691-4581</issn><issn>2691-4581</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkEtrwzAQhEVpoSHNvYce9Afs6mHH0jENfQQSAq17NmtpHVQcq0gutP--yuOQ0y67MwPzEXLPWc4504_1YpULJmQuZcFVpa_IRMw1z4pS8euL_ZbMYvxijImSCyGqCcEFfUcIAYYd7nEYKQw2XeLoA2ZPENHSjfvFQDfeYk87H2gNYYdjtg0u6Q__n350e2-hpx_p4o4xyx5idJ0zMDo_3JGbDvqIs_Ocks-X53r5lq23r6vlYp2Z1EJkRqIp7VwZXWhotUGJMrVBI4G3ClqhQVWyEy2DwmptQYkSZKVMxwyfl4WcEnbKNcHHGLBrvoPbQ_hrOGsOpJpEqjmQas6kkuXhZHGIeCGXZZEYyX_snmXn</recordid><startdate>202406</startdate><enddate>202406</enddate><creator>Jia, Li</creator><creator>Ma, Tinghuai</creator><creator>Rong, Huan</creator><creator>Sheng, Victor S.</creator><creator>Huang, Xuejian</creator><creator>Xie, Xintong</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0003-2718-1203</orcidid><orcidid>https://orcid.org/0009-0003-8747-0872</orcidid><orcidid>https://orcid.org/0000-0003-2320-1692</orcidid><orcidid>https://orcid.org/0000-0003-0790-1779</orcidid><orcidid>https://orcid.org/0000-0003-0542-1827</orcidid><orcidid>https://orcid.org/0000-0003-4960-174X</orcidid></search><sort><creationdate>202406</creationdate><title>A Rearrangement and Restore-Based Mixer Model for Target-Oriented Multimodal Sentiment Classification</title><author>Jia, Li ; Ma, Tinghuai ; Rong, Huan ; Sheng, Victor S. ; Huang, Xuejian ; Xie, Xintong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1092-c3ec5d68c949ab9ce3e3879ec3a1b8ab29a873f2b0a4d99da825a378cf0c16543</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial intelligence</topic><topic>Feature extraction</topic><topic>Feature mixing</topic><topic>Image restoration</topic><topic>Mixers</topic><topic>MLPs-based</topic><topic>rearrangement and restore operations</topic><topic>target-oriented multimodal sentiment classification</topic><topic>Task analysis</topic><topic>Transformers</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jia, Li</creatorcontrib><creatorcontrib>Ma, Tinghuai</creatorcontrib><creatorcontrib>Rong, Huan</creatorcontrib><creatorcontrib>Sheng, Victor S.</creatorcontrib><creatorcontrib>Huang, Xuejian</creatorcontrib><creatorcontrib>Xie, Xintong</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on artificial intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jia, Li</au><au>Ma, Tinghuai</au><au>Rong, Huan</au><au>Sheng, Victor S.</au><au>Huang, Xuejian</au><au>Xie, Xintong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Rearrangement and Restore-Based Mixer Model for Target-Oriented Multimodal Sentiment Classification</atitle><jtitle>IEEE transactions on artificial intelligence</jtitle><stitle>TAI</stitle><date>2024-06</date><risdate>2024</risdate><volume>5</volume><issue>6</issue><spage>3109</spage><epage>3119</epage><pages>3109-3119</pages><issn>2691-4581</issn><eissn>2691-4581</eissn><coden>ITAICB</coden><abstract>With the development of fine-grained multimodal sentiment analysis tasks, target-oriented multimodal sentiment (TMSC) analysis has received more attention, which aims to classify the sentiment of target with the help of textual and associated image features. Existing methods focus on exploring fine-grained image features and incorporate transformer-based complex fusion strategies, while ignoring the heavy computational burden. Recently, some lightweight multilayer perceptrons (MLP)-based methods have been successfully applied to multimodal sentiment classification tasks. In this article, we propose an effective rearrangement and restore mixer model (RR-Mixer) for TMSC, which dedicates the interaction of image, text, and targets along the modal-axis , sequential-axis , and feature channel-axis through rearrangement and restore operations. Specifically, we take vision transformer (ViT) and robustly optimized BERT (RoBERTa) pretrained models to extract image and textual features, respectively. Further, we adopt cosine similarity to select the most semantically relevant image features. Then, an RR-Mixer module is designed for mixed multimodal features, with the core technology consisting of rolling, grouping rearrangement and restore operations. Moreover, we introduce MLP Unit to learn the information of different modalities for intermodal interaction. The results show that our model achieves superior performance on two benchmark multimodal datasets, TWITTER-15 and TWITTER-17, with a significant improvement of 4.66% and 1.26% in terms of macro-F1.</abstract><pub>IEEE</pub><doi>10.1109/TAI.2023.3341879</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0003-2718-1203</orcidid><orcidid>https://orcid.org/0009-0003-8747-0872</orcidid><orcidid>https://orcid.org/0000-0003-2320-1692</orcidid><orcidid>https://orcid.org/0000-0003-0790-1779</orcidid><orcidid>https://orcid.org/0000-0003-0542-1827</orcidid><orcidid>https://orcid.org/0000-0003-4960-174X</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 2691-4581
ispartof	IEEE transactions on artificial intelligence, 2024-06, Vol.5 (6), p.3109-3119
issn	2691-4581 2691-4581
language	eng
recordid	cdi_ieee_primary_10354512
source	IEEE Electronic Library (IEL)
subjects	Artificial intelligence Feature extraction Feature mixing Image restoration Mixers MLPs-based rearrangement and restore operations target-oriented multimodal sentiment classification Task analysis Transformers Visualization
title	A Rearrangement and Restore-Based Mixer Model for Target-Oriented Multimodal Sentiment Classification
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-04T10%3A22%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Rearrangement%20and%20Restore-Based%20Mixer%20Model%20for%20Target-Oriented%20Multimodal%20Sentiment%20Classification&rft.jtitle=IEEE%20transactions%20on%20artificial%20intelligence&rft.au=Jia,%20Li&rft.date=2024-06&rft.volume=5&rft.issue=6&rft.spage=3109&rft.epage=3119&rft.pages=3109-3119&rft.issn=2691-4581&rft.eissn=2691-4581&rft.coden=ITAICB&rft_id=info:doi/10.1109/TAI.2023.3341879&rft_dat=%3Ccrossref_RIE%3E10_1109_TAI_2023_3341879%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10354512&rfr_iscdi=true