A Rearrangement and Restore-Based Mixer Model for Target-Oriented Multimodal Sentiment Classification
With the development of fine-grained multimodal sentiment analysis tasks, target-oriented multimodal sentiment (TMSC) analysis has received more attention, which aims to classify the sentiment of target with the help of textual and associated image features. Existing methods focus on exploring fine-...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on artificial intelligence 2024-06, Vol.5 (6), p.3109-3119 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 3119 |
---|---|
container_issue | 6 |
container_start_page | 3109 |
container_title | IEEE transactions on artificial intelligence |
container_volume | 5 |
creator | Jia, Li Ma, Tinghuai Rong, Huan Sheng, Victor S. Huang, Xuejian Xie, Xintong |
description | With the development of fine-grained multimodal sentiment analysis tasks, target-oriented multimodal sentiment (TMSC) analysis has received more attention, which aims to classify the sentiment of target with the help of textual and associated image features. Existing methods focus on exploring fine-grained image features and incorporate transformer-based complex fusion strategies, while ignoring the heavy computational burden. Recently, some lightweight multilayer perceptrons (MLP)-based methods have been successfully applied to multimodal sentiment classification tasks. In this article, we propose an effective rearrangement and restore mixer model (RR-Mixer) for TMSC, which dedicates the interaction of image, text, and targets along the modal-axis , sequential-axis , and feature channel-axis through rearrangement and restore operations. Specifically, we take vision transformer (ViT) and robustly optimized BERT (RoBERTa) pretrained models to extract image and textual features, respectively. Further, we adopt cosine similarity to select the most semantically relevant image features. Then, an RR-Mixer module is designed for mixed multimodal features, with the core technology consisting of rolling, grouping rearrangement and restore operations. Moreover, we introduce MLP Unit to learn the information of different modalities for intermodal interaction. The results show that our model achieves superior performance on two benchmark multimodal datasets, TWITTER-15 and TWITTER-17, with a significant improvement of 4.66% and 1.26% in terms of macro-F1. |
doi_str_mv | 10.1109/TAI.2023.3341879 |
format | Article |
fullrecord | <record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_ieee_primary_10354512</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10354512</ieee_id><sourcerecordid>10_1109_TAI_2023_3341879</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1092-c3ec5d68c949ab9ce3e3879ec3a1b8ab29a873f2b0a4d99da825a378cf0c16543</originalsourceid><addsrcrecordid>eNpNkEtrwzAQhEVpoSHNvYce9Afs6mHH0jENfQQSAq17NmtpHVQcq0gutP--yuOQ0y67MwPzEXLPWc4504_1YpULJmQuZcFVpa_IRMw1z4pS8euL_ZbMYvxijImSCyGqCcEFfUcIAYYd7nEYKQw2XeLoA2ZPENHSjfvFQDfeYk87H2gNYYdjtg0u6Q__n350e2-hpx_p4o4xyx5idJ0zMDo_3JGbDvqIs_Ocks-X53r5lq23r6vlYp2Z1EJkRqIp7VwZXWhotUGJMrVBI4G3ClqhQVWyEy2DwmptQYkSZKVMxwyfl4WcEnbKNcHHGLBrvoPbQ_hrOGsOpJpEqjmQas6kkuXhZHGIeCGXZZEYyX_snmXn</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A Rearrangement and Restore-Based Mixer Model for Target-Oriented Multimodal Sentiment Classification</title><source>IEEE Electronic Library (IEL)</source><creator>Jia, Li ; Ma, Tinghuai ; Rong, Huan ; Sheng, Victor S. ; Huang, Xuejian ; Xie, Xintong</creator><creatorcontrib>Jia, Li ; Ma, Tinghuai ; Rong, Huan ; Sheng, Victor S. ; Huang, Xuejian ; Xie, Xintong</creatorcontrib><description>With the development of fine-grained multimodal sentiment analysis tasks, target-oriented multimodal sentiment (TMSC) analysis has received more attention, which aims to classify the sentiment of target with the help of textual and associated image features. Existing methods focus on exploring fine-grained image features and incorporate transformer-based complex fusion strategies, while ignoring the heavy computational burden. Recently, some lightweight multilayer perceptrons (MLP)-based methods have been successfully applied to multimodal sentiment classification tasks. In this article, we propose an effective rearrangement and restore mixer model (RR-Mixer) for TMSC, which dedicates the interaction of image, text, and targets along the modal-axis , sequential-axis , and feature channel-axis through rearrangement and restore operations. Specifically, we take vision transformer (ViT) and robustly optimized BERT (RoBERTa) pretrained models to extract image and textual features, respectively. Further, we adopt cosine similarity to select the most semantically relevant image features. Then, an RR-Mixer module is designed for mixed multimodal features, with the core technology consisting of rolling, grouping rearrangement and restore operations. Moreover, we introduce MLP Unit to learn the information of different modalities for intermodal interaction. The results show that our model achieves superior performance on two benchmark multimodal datasets, TWITTER-15 and TWITTER-17, with a significant improvement of 4.66% and 1.26% in terms of macro-F1.</description><identifier>ISSN: 2691-4581</identifier><identifier>EISSN: 2691-4581</identifier><identifier>DOI: 10.1109/TAI.2023.3341879</identifier><identifier>CODEN: ITAICB</identifier><language>eng</language><publisher>IEEE</publisher><subject>Artificial intelligence ; Feature extraction ; Feature mixing ; Image restoration ; Mixers ; MLPs-based ; rearrangement and restore operations ; target-oriented multimodal sentiment classification ; Task analysis ; Transformers ; Visualization</subject><ispartof>IEEE transactions on artificial intelligence, 2024-06, Vol.5 (6), p.3109-3119</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c1092-c3ec5d68c949ab9ce3e3879ec3a1b8ab29a873f2b0a4d99da825a378cf0c16543</citedby><cites>FETCH-LOGICAL-c1092-c3ec5d68c949ab9ce3e3879ec3a1b8ab29a873f2b0a4d99da825a378cf0c16543</cites><orcidid>0000-0003-2718-1203 ; 0009-0003-8747-0872 ; 0000-0003-2320-1692 ; 0000-0003-0790-1779 ; 0000-0003-0542-1827 ; 0000-0003-4960-174X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10354512$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,782,786,798,27931,27932,54765</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10354512$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Jia, Li</creatorcontrib><creatorcontrib>Ma, Tinghuai</creatorcontrib><creatorcontrib>Rong, Huan</creatorcontrib><creatorcontrib>Sheng, Victor S.</creatorcontrib><creatorcontrib>Huang, Xuejian</creatorcontrib><creatorcontrib>Xie, Xintong</creatorcontrib><title>A Rearrangement and Restore-Based Mixer Model for Target-Oriented Multimodal Sentiment Classification</title><title>IEEE transactions on artificial intelligence</title><addtitle>TAI</addtitle><description>With the development of fine-grained multimodal sentiment analysis tasks, target-oriented multimodal sentiment (TMSC) analysis has received more attention, which aims to classify the sentiment of target with the help of textual and associated image features. Existing methods focus on exploring fine-grained image features and incorporate transformer-based complex fusion strategies, while ignoring the heavy computational burden. Recently, some lightweight multilayer perceptrons (MLP)-based methods have been successfully applied to multimodal sentiment classification tasks. In this article, we propose an effective rearrangement and restore mixer model (RR-Mixer) for TMSC, which dedicates the interaction of image, text, and targets along the modal-axis , sequential-axis , and feature channel-axis through rearrangement and restore operations. Specifically, we take vision transformer (ViT) and robustly optimized BERT (RoBERTa) pretrained models to extract image and textual features, respectively. Further, we adopt cosine similarity to select the most semantically relevant image features. Then, an RR-Mixer module is designed for mixed multimodal features, with the core technology consisting of rolling, grouping rearrangement and restore operations. Moreover, we introduce MLP Unit to learn the information of different modalities for intermodal interaction. The results show that our model achieves superior performance on two benchmark multimodal datasets, TWITTER-15 and TWITTER-17, with a significant improvement of 4.66% and 1.26% in terms of macro-F1.</description><subject>Artificial intelligence</subject><subject>Feature extraction</subject><subject>Feature mixing</subject><subject>Image restoration</subject><subject>Mixers</subject><subject>MLPs-based</subject><subject>rearrangement and restore operations</subject><subject>target-oriented multimodal sentiment classification</subject><subject>Task analysis</subject><subject>Transformers</subject><subject>Visualization</subject><issn>2691-4581</issn><issn>2691-4581</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkEtrwzAQhEVpoSHNvYce9Afs6mHH0jENfQQSAq17NmtpHVQcq0gutP--yuOQ0y67MwPzEXLPWc4504_1YpULJmQuZcFVpa_IRMw1z4pS8euL_ZbMYvxijImSCyGqCcEFfUcIAYYd7nEYKQw2XeLoA2ZPENHSjfvFQDfeYk87H2gNYYdjtg0u6Q__n350e2-hpx_p4o4xyx5idJ0zMDo_3JGbDvqIs_Ocks-X53r5lq23r6vlYp2Z1EJkRqIp7VwZXWhotUGJMrVBI4G3ClqhQVWyEy2DwmptQYkSZKVMxwyfl4WcEnbKNcHHGLBrvoPbQ_hrOGsOpJpEqjmQas6kkuXhZHGIeCGXZZEYyX_snmXn</recordid><startdate>202406</startdate><enddate>202406</enddate><creator>Jia, Li</creator><creator>Ma, Tinghuai</creator><creator>Rong, Huan</creator><creator>Sheng, Victor S.</creator><creator>Huang, Xuejian</creator><creator>Xie, Xintong</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0003-2718-1203</orcidid><orcidid>https://orcid.org/0009-0003-8747-0872</orcidid><orcidid>https://orcid.org/0000-0003-2320-1692</orcidid><orcidid>https://orcid.org/0000-0003-0790-1779</orcidid><orcidid>https://orcid.org/0000-0003-0542-1827</orcidid><orcidid>https://orcid.org/0000-0003-4960-174X</orcidid></search><sort><creationdate>202406</creationdate><title>A Rearrangement and Restore-Based Mixer Model for Target-Oriented Multimodal Sentiment Classification</title><author>Jia, Li ; Ma, Tinghuai ; Rong, Huan ; Sheng, Victor S. ; Huang, Xuejian ; Xie, Xintong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1092-c3ec5d68c949ab9ce3e3879ec3a1b8ab29a873f2b0a4d99da825a378cf0c16543</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial intelligence</topic><topic>Feature extraction</topic><topic>Feature mixing</topic><topic>Image restoration</topic><topic>Mixers</topic><topic>MLPs-based</topic><topic>rearrangement and restore operations</topic><topic>target-oriented multimodal sentiment classification</topic><topic>Task analysis</topic><topic>Transformers</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jia, Li</creatorcontrib><creatorcontrib>Ma, Tinghuai</creatorcontrib><creatorcontrib>Rong, Huan</creatorcontrib><creatorcontrib>Sheng, Victor S.</creatorcontrib><creatorcontrib>Huang, Xuejian</creatorcontrib><creatorcontrib>Xie, Xintong</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on artificial intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jia, Li</au><au>Ma, Tinghuai</au><au>Rong, Huan</au><au>Sheng, Victor S.</au><au>Huang, Xuejian</au><au>Xie, Xintong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Rearrangement and Restore-Based Mixer Model for Target-Oriented Multimodal Sentiment Classification</atitle><jtitle>IEEE transactions on artificial intelligence</jtitle><stitle>TAI</stitle><date>2024-06</date><risdate>2024</risdate><volume>5</volume><issue>6</issue><spage>3109</spage><epage>3119</epage><pages>3109-3119</pages><issn>2691-4581</issn><eissn>2691-4581</eissn><coden>ITAICB</coden><abstract>With the development of fine-grained multimodal sentiment analysis tasks, target-oriented multimodal sentiment (TMSC) analysis has received more attention, which aims to classify the sentiment of target with the help of textual and associated image features. Existing methods focus on exploring fine-grained image features and incorporate transformer-based complex fusion strategies, while ignoring the heavy computational burden. Recently, some lightweight multilayer perceptrons (MLP)-based methods have been successfully applied to multimodal sentiment classification tasks. In this article, we propose an effective rearrangement and restore mixer model (RR-Mixer) for TMSC, which dedicates the interaction of image, text, and targets along the modal-axis , sequential-axis , and feature channel-axis through rearrangement and restore operations. Specifically, we take vision transformer (ViT) and robustly optimized BERT (RoBERTa) pretrained models to extract image and textual features, respectively. Further, we adopt cosine similarity to select the most semantically relevant image features. Then, an RR-Mixer module is designed for mixed multimodal features, with the core technology consisting of rolling, grouping rearrangement and restore operations. Moreover, we introduce MLP Unit to learn the information of different modalities for intermodal interaction. The results show that our model achieves superior performance on two benchmark multimodal datasets, TWITTER-15 and TWITTER-17, with a significant improvement of 4.66% and 1.26% in terms of macro-F1.</abstract><pub>IEEE</pub><doi>10.1109/TAI.2023.3341879</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0003-2718-1203</orcidid><orcidid>https://orcid.org/0009-0003-8747-0872</orcidid><orcidid>https://orcid.org/0000-0003-2320-1692</orcidid><orcidid>https://orcid.org/0000-0003-0790-1779</orcidid><orcidid>https://orcid.org/0000-0003-0542-1827</orcidid><orcidid>https://orcid.org/0000-0003-4960-174X</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 2691-4581 |
ispartof | IEEE transactions on artificial intelligence, 2024-06, Vol.5 (6), p.3109-3119 |
issn | 2691-4581 2691-4581 |
language | eng |
recordid | cdi_ieee_primary_10354512 |
source | IEEE Electronic Library (IEL) |
subjects | Artificial intelligence Feature extraction Feature mixing Image restoration Mixers MLPs-based rearrangement and restore operations target-oriented multimodal sentiment classification Task analysis Transformers Visualization |
title | A Rearrangement and Restore-Based Mixer Model for Target-Oriented Multimodal Sentiment Classification |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-04T10%3A22%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Rearrangement%20and%20Restore-Based%20Mixer%20Model%20for%20Target-Oriented%20Multimodal%20Sentiment%20Classification&rft.jtitle=IEEE%20transactions%20on%20artificial%20intelligence&rft.au=Jia,%20Li&rft.date=2024-06&rft.volume=5&rft.issue=6&rft.spage=3109&rft.epage=3119&rft.pages=3109-3119&rft.issn=2691-4581&rft.eissn=2691-4581&rft.coden=ITAICB&rft_id=info:doi/10.1109/TAI.2023.3341879&rft_dat=%3Ccrossref_RIE%3E10_1109_TAI_2023_3341879%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10354512&rfr_iscdi=true |