A Rearrangement and Restore-Based Mixer Model for Target-Oriented Multimodal Sentiment Classification

With the development of fine-grained multimodal sentiment analysis tasks, target-oriented multimodal sentiment (TMSC) analysis has received more attention, which aims to classify the sentiment of target with the help of textual and associated image features. Existing methods focus on exploring fine-...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on artificial intelligence 2024-06, Vol.5 (6), p.3109-3119
Hauptverfasser: Jia, Li, Ma, Tinghuai, Rong, Huan, Sheng, Victor S., Huang, Xuejian, Xie, Xintong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 3119
container_issue 6
container_start_page 3109
container_title IEEE transactions on artificial intelligence
container_volume 5
creator Jia, Li
Ma, Tinghuai
Rong, Huan
Sheng, Victor S.
Huang, Xuejian
Xie, Xintong
description With the development of fine-grained multimodal sentiment analysis tasks, target-oriented multimodal sentiment (TMSC) analysis has received more attention, which aims to classify the sentiment of target with the help of textual and associated image features. Existing methods focus on exploring fine-grained image features and incorporate transformer-based complex fusion strategies, while ignoring the heavy computational burden. Recently, some lightweight multilayer perceptrons (MLP)-based methods have been successfully applied to multimodal sentiment classification tasks. In this article, we propose an effective rearrangement and restore mixer model (RR-Mixer) for TMSC, which dedicates the interaction of image, text, and targets along the modal-axis , sequential-axis , and feature channel-axis through rearrangement and restore operations. Specifically, we take vision transformer (ViT) and robustly optimized BERT (RoBERTa) pretrained models to extract image and textual features, respectively. Further, we adopt cosine similarity to select the most semantically relevant image features. Then, an RR-Mixer module is designed for mixed multimodal features, with the core technology consisting of rolling, grouping rearrangement and restore operations. Moreover, we introduce MLP Unit to learn the information of different modalities for intermodal interaction. The results show that our model achieves superior performance on two benchmark multimodal datasets, TWITTER-15 and TWITTER-17, with a significant improvement of 4.66% and 1.26% in terms of macro-F1.
doi_str_mv 10.1109/TAI.2023.3341879
format Article
fullrecord <record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_ieee_primary_10354512</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10354512</ieee_id><sourcerecordid>10_1109_TAI_2023_3341879</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1092-c3ec5d68c949ab9ce3e3879ec3a1b8ab29a873f2b0a4d99da825a378cf0c16543</originalsourceid><addsrcrecordid>eNpNkEtrwzAQhEVpoSHNvYce9Afs6mHH0jENfQQSAq17NmtpHVQcq0gutP--yuOQ0y67MwPzEXLPWc4504_1YpULJmQuZcFVpa_IRMw1z4pS8euL_ZbMYvxijImSCyGqCcEFfUcIAYYd7nEYKQw2XeLoA2ZPENHSjfvFQDfeYk87H2gNYYdjtg0u6Q__n350e2-hpx_p4o4xyx5idJ0zMDo_3JGbDvqIs_Ocks-X53r5lq23r6vlYp2Z1EJkRqIp7VwZXWhotUGJMrVBI4G3ClqhQVWyEy2DwmptQYkSZKVMxwyfl4WcEnbKNcHHGLBrvoPbQ_hrOGsOpJpEqjmQas6kkuXhZHGIeCGXZZEYyX_snmXn</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A Rearrangement and Restore-Based Mixer Model for Target-Oriented Multimodal Sentiment Classification</title><source>IEEE Electronic Library (IEL)</source><creator>Jia, Li ; Ma, Tinghuai ; Rong, Huan ; Sheng, Victor S. ; Huang, Xuejian ; Xie, Xintong</creator><creatorcontrib>Jia, Li ; Ma, Tinghuai ; Rong, Huan ; Sheng, Victor S. ; Huang, Xuejian ; Xie, Xintong</creatorcontrib><description>With the development of fine-grained multimodal sentiment analysis tasks, target-oriented multimodal sentiment (TMSC) analysis has received more attention, which aims to classify the sentiment of target with the help of textual and associated image features. Existing methods focus on exploring fine-grained image features and incorporate transformer-based complex fusion strategies, while ignoring the heavy computational burden. Recently, some lightweight multilayer perceptrons (MLP)-based methods have been successfully applied to multimodal sentiment classification tasks. In this article, we propose an effective rearrangement and restore mixer model (RR-Mixer) for TMSC, which dedicates the interaction of image, text, and targets along the modal-axis , sequential-axis , and feature channel-axis through rearrangement and restore operations. Specifically, we take vision transformer (ViT) and robustly optimized BERT (RoBERTa) pretrained models to extract image and textual features, respectively. Further, we adopt cosine similarity to select the most semantically relevant image features. Then, an RR-Mixer module is designed for mixed multimodal features, with the core technology consisting of rolling, grouping rearrangement and restore operations. Moreover, we introduce MLP Unit to learn the information of different modalities for intermodal interaction. The results show that our model achieves superior performance on two benchmark multimodal datasets, TWITTER-15 and TWITTER-17, with a significant improvement of 4.66% and 1.26% in terms of macro-F1.</description><identifier>ISSN: 2691-4581</identifier><identifier>EISSN: 2691-4581</identifier><identifier>DOI: 10.1109/TAI.2023.3341879</identifier><identifier>CODEN: ITAICB</identifier><language>eng</language><publisher>IEEE</publisher><subject>Artificial intelligence ; Feature extraction ; Feature mixing ; Image restoration ; Mixers ; MLPs-based ; rearrangement and restore operations ; target-oriented multimodal sentiment classification ; Task analysis ; Transformers ; Visualization</subject><ispartof>IEEE transactions on artificial intelligence, 2024-06, Vol.5 (6), p.3109-3119</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c1092-c3ec5d68c949ab9ce3e3879ec3a1b8ab29a873f2b0a4d99da825a378cf0c16543</citedby><cites>FETCH-LOGICAL-c1092-c3ec5d68c949ab9ce3e3879ec3a1b8ab29a873f2b0a4d99da825a378cf0c16543</cites><orcidid>0000-0003-2718-1203 ; 0009-0003-8747-0872 ; 0000-0003-2320-1692 ; 0000-0003-0790-1779 ; 0000-0003-0542-1827 ; 0000-0003-4960-174X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10354512$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,782,786,798,27931,27932,54765</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10354512$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Jia, Li</creatorcontrib><creatorcontrib>Ma, Tinghuai</creatorcontrib><creatorcontrib>Rong, Huan</creatorcontrib><creatorcontrib>Sheng, Victor S.</creatorcontrib><creatorcontrib>Huang, Xuejian</creatorcontrib><creatorcontrib>Xie, Xintong</creatorcontrib><title>A Rearrangement and Restore-Based Mixer Model for Target-Oriented Multimodal Sentiment Classification</title><title>IEEE transactions on artificial intelligence</title><addtitle>TAI</addtitle><description>With the development of fine-grained multimodal sentiment analysis tasks, target-oriented multimodal sentiment (TMSC) analysis has received more attention, which aims to classify the sentiment of target with the help of textual and associated image features. Existing methods focus on exploring fine-grained image features and incorporate transformer-based complex fusion strategies, while ignoring the heavy computational burden. Recently, some lightweight multilayer perceptrons (MLP)-based methods have been successfully applied to multimodal sentiment classification tasks. In this article, we propose an effective rearrangement and restore mixer model (RR-Mixer) for TMSC, which dedicates the interaction of image, text, and targets along the modal-axis , sequential-axis , and feature channel-axis through rearrangement and restore operations. Specifically, we take vision transformer (ViT) and robustly optimized BERT (RoBERTa) pretrained models to extract image and textual features, respectively. Further, we adopt cosine similarity to select the most semantically relevant image features. Then, an RR-Mixer module is designed for mixed multimodal features, with the core technology consisting of rolling, grouping rearrangement and restore operations. Moreover, we introduce MLP Unit to learn the information of different modalities for intermodal interaction. The results show that our model achieves superior performance on two benchmark multimodal datasets, TWITTER-15 and TWITTER-17, with a significant improvement of 4.66% and 1.26% in terms of macro-F1.</description><subject>Artificial intelligence</subject><subject>Feature extraction</subject><subject>Feature mixing</subject><subject>Image restoration</subject><subject>Mixers</subject><subject>MLPs-based</subject><subject>rearrangement and restore operations</subject><subject>target-oriented multimodal sentiment classification</subject><subject>Task analysis</subject><subject>Transformers</subject><subject>Visualization</subject><issn>2691-4581</issn><issn>2691-4581</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkEtrwzAQhEVpoSHNvYce9Afs6mHH0jENfQQSAq17NmtpHVQcq0gutP--yuOQ0y67MwPzEXLPWc4504_1YpULJmQuZcFVpa_IRMw1z4pS8euL_ZbMYvxijImSCyGqCcEFfUcIAYYd7nEYKQw2XeLoA2ZPENHSjfvFQDfeYk87H2gNYYdjtg0u6Q__n350e2-hpx_p4o4xyx5idJ0zMDo_3JGbDvqIs_Ocks-X53r5lq23r6vlYp2Z1EJkRqIp7VwZXWhotUGJMrVBI4G3ClqhQVWyEy2DwmptQYkSZKVMxwyfl4WcEnbKNcHHGLBrvoPbQ_hrOGsOpJpEqjmQas6kkuXhZHGIeCGXZZEYyX_snmXn</recordid><startdate>202406</startdate><enddate>202406</enddate><creator>Jia, Li</creator><creator>Ma, Tinghuai</creator><creator>Rong, Huan</creator><creator>Sheng, Victor S.</creator><creator>Huang, Xuejian</creator><creator>Xie, Xintong</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0003-2718-1203</orcidid><orcidid>https://orcid.org/0009-0003-8747-0872</orcidid><orcidid>https://orcid.org/0000-0003-2320-1692</orcidid><orcidid>https://orcid.org/0000-0003-0790-1779</orcidid><orcidid>https://orcid.org/0000-0003-0542-1827</orcidid><orcidid>https://orcid.org/0000-0003-4960-174X</orcidid></search><sort><creationdate>202406</creationdate><title>A Rearrangement and Restore-Based Mixer Model for Target-Oriented Multimodal Sentiment Classification</title><author>Jia, Li ; Ma, Tinghuai ; Rong, Huan ; Sheng, Victor S. ; Huang, Xuejian ; Xie, Xintong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1092-c3ec5d68c949ab9ce3e3879ec3a1b8ab29a873f2b0a4d99da825a378cf0c16543</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial intelligence</topic><topic>Feature extraction</topic><topic>Feature mixing</topic><topic>Image restoration</topic><topic>Mixers</topic><topic>MLPs-based</topic><topic>rearrangement and restore operations</topic><topic>target-oriented multimodal sentiment classification</topic><topic>Task analysis</topic><topic>Transformers</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jia, Li</creatorcontrib><creatorcontrib>Ma, Tinghuai</creatorcontrib><creatorcontrib>Rong, Huan</creatorcontrib><creatorcontrib>Sheng, Victor S.</creatorcontrib><creatorcontrib>Huang, Xuejian</creatorcontrib><creatorcontrib>Xie, Xintong</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on artificial intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jia, Li</au><au>Ma, Tinghuai</au><au>Rong, Huan</au><au>Sheng, Victor S.</au><au>Huang, Xuejian</au><au>Xie, Xintong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Rearrangement and Restore-Based Mixer Model for Target-Oriented Multimodal Sentiment Classification</atitle><jtitle>IEEE transactions on artificial intelligence</jtitle><stitle>TAI</stitle><date>2024-06</date><risdate>2024</risdate><volume>5</volume><issue>6</issue><spage>3109</spage><epage>3119</epage><pages>3109-3119</pages><issn>2691-4581</issn><eissn>2691-4581</eissn><coden>ITAICB</coden><abstract>With the development of fine-grained multimodal sentiment analysis tasks, target-oriented multimodal sentiment (TMSC) analysis has received more attention, which aims to classify the sentiment of target with the help of textual and associated image features. Existing methods focus on exploring fine-grained image features and incorporate transformer-based complex fusion strategies, while ignoring the heavy computational burden. Recently, some lightweight multilayer perceptrons (MLP)-based methods have been successfully applied to multimodal sentiment classification tasks. In this article, we propose an effective rearrangement and restore mixer model (RR-Mixer) for TMSC, which dedicates the interaction of image, text, and targets along the modal-axis , sequential-axis , and feature channel-axis through rearrangement and restore operations. Specifically, we take vision transformer (ViT) and robustly optimized BERT (RoBERTa) pretrained models to extract image and textual features, respectively. Further, we adopt cosine similarity to select the most semantically relevant image features. Then, an RR-Mixer module is designed for mixed multimodal features, with the core technology consisting of rolling, grouping rearrangement and restore operations. Moreover, we introduce MLP Unit to learn the information of different modalities for intermodal interaction. The results show that our model achieves superior performance on two benchmark multimodal datasets, TWITTER-15 and TWITTER-17, with a significant improvement of 4.66% and 1.26% in terms of macro-F1.</abstract><pub>IEEE</pub><doi>10.1109/TAI.2023.3341879</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0003-2718-1203</orcidid><orcidid>https://orcid.org/0009-0003-8747-0872</orcidid><orcidid>https://orcid.org/0000-0003-2320-1692</orcidid><orcidid>https://orcid.org/0000-0003-0790-1779</orcidid><orcidid>https://orcid.org/0000-0003-0542-1827</orcidid><orcidid>https://orcid.org/0000-0003-4960-174X</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 2691-4581
ispartof IEEE transactions on artificial intelligence, 2024-06, Vol.5 (6), p.3109-3119
issn 2691-4581
2691-4581
language eng
recordid cdi_ieee_primary_10354512
source IEEE Electronic Library (IEL)
subjects Artificial intelligence
Feature extraction
Feature mixing
Image restoration
Mixers
MLPs-based
rearrangement and restore operations
target-oriented multimodal sentiment classification
Task analysis
Transformers
Visualization
title A Rearrangement and Restore-Based Mixer Model for Target-Oriented Multimodal Sentiment Classification
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-04T10%3A22%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Rearrangement%20and%20Restore-Based%20Mixer%20Model%20for%20Target-Oriented%20Multimodal%20Sentiment%20Classification&rft.jtitle=IEEE%20transactions%20on%20artificial%20intelligence&rft.au=Jia,%20Li&rft.date=2024-06&rft.volume=5&rft.issue=6&rft.spage=3109&rft.epage=3119&rft.pages=3109-3119&rft.issn=2691-4581&rft.eissn=2691-4581&rft.coden=ITAICB&rft_id=info:doi/10.1109/TAI.2023.3341879&rft_dat=%3Ccrossref_RIE%3E10_1109_TAI_2023_3341879%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10354512&rfr_iscdi=true