A context-sensitive multi-tier deep learning framework for multimodal sentiment analysis
One of the most appealing multidisciplinary research areas in Artificial Intelligence (AI) is Sentiment Analysis (SA). Due to the intricate and complementary interactions between several modalities, Multimodal Sentiment Analysis (MSA) is an extremely difficult work that has a wide range of applicati...
Gespeichert in:
Veröffentlicht in: | Multimedia tools and applications 2024-05, Vol.83 (18), p.54249-54278 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 54278 |
---|---|
container_issue | 18 |
container_start_page | 54249 |
container_title | Multimedia tools and applications |
container_volume | 83 |
creator | P, Ganesh Kumar S, Arul Antran Vijay V, Jothi Prakash Paul, Anand Nayyar, Anand |
description | One of the most appealing multidisciplinary research areas in Artificial Intelligence (AI) is Sentiment Analysis (SA). Due to the intricate and complementary interactions between several modalities, Multimodal Sentiment Analysis (MSA) is an extremely difficult work that has a wide range of applications. In the subject of multimodal sentiment analysis, numerous deep learning models and different techniques have been suggested, but they do not investigate the explicit context of words and are unable to model diverse components of a sentence. Hence, the full potential of such diverse data has not been explored. In this research, a Context-Sensitive Multi-Tier Deep Learning Framework (CS-MDF) is proposed for sentiment analysis on multimodal data. The CS-MDF uses a three-tier architecture for extracting context-sensitive information. The first tier utilizes Convolutional Neural Network (CNN) for extracting text-based features, 3D-CNN model for extracting visual features and open-Source Media Interpretation by Large feature-space Extraction (openSMILE) tool kit for audio feature extraction.The first tier focuses on extracting the unimodal features from the utterances. This level of extraction ignores context-sensitive data while determining the feature.CNNs are suitable for text data because they are particularly useful for identifying local patterns and dependencies in data.The second tier uses the features extracted from the first tier.The context-sensitive unimodal characteristics are extracted in this tier using the Bi-directional Gated Recurrent Unit (BiGRU), which is used to comprehend inter-utterance links and uncover contextual evidence.The output from tier two is combined and passed to the third tier, which fuses the features from different modalities and trains a single BiGRU model that provides the final classification.This method applies the BiGRU model to sequential data processing, using the advantages of both modalities and capturing their interdependencies.Experimental results obtained on six real-life datasets (Flickr Images dataset, Multi-View Sentiment Analysis dataset, Getty Images dataset, Balanced Twitter for Sentiment Analysis dataset, CMU-MOSI Dataset) show that the proposed CS-MDF model has achieved better performance compared with ten state-of-the-art approaches, which are validated by F1 score, precision, accuracy, and recall metrics.An ablation study is carried out on the proposed framework that demonstrates the viability of the desig |
doi_str_mv | 10.1007/s11042-023-17601-1 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3055249286</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3055249286</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-b49305f1bc6bf86094a8678f61fd355d370ff3179c1e09c448f63dcb117ccf7e3</originalsourceid><addsrcrecordid>eNp9kE9LxDAQxYMouK5-AU8Bz9FM0ibtcVn8BwteFLyFNE2Wrm2zJll1v73RCnryMjMw7z1mfgidA70ESuVVBKAFI5RxAlJQIHCAZlBKTqRkcPhnPkYnMW4oBVGyYoaeF9j4MdmPRKIdY5e6N4uHXZ86kjobcGvtFvdWh7Eb19gFPdh3H16w82GSDb7VPc7ePOaC9aj7feziKTpyuo_27KfP0dPN9ePyjqwebu-XixUxTNJEmqLmtHTQGNG4StC60JWQlRPgWl6WLZfUOQ6yNmBpbYoir3hrGgBpjJOWz9HFlLsN_nVnY1Ibvwv5iKhycP6xZpXIKjapTPAxBuvUNnSDDnsFVH0RVBNBlQmqb4IKsolPppjF49qG3-h_XJ9OonTL</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3055249286</pqid></control><display><type>article</type><title>A context-sensitive multi-tier deep learning framework for multimodal sentiment analysis</title><source>SpringerLink Journals</source><creator>P, Ganesh Kumar ; S, Arul Antran Vijay ; V, Jothi Prakash ; Paul, Anand ; Nayyar, Anand</creator><creatorcontrib>P, Ganesh Kumar ; S, Arul Antran Vijay ; V, Jothi Prakash ; Paul, Anand ; Nayyar, Anand</creatorcontrib><description>One of the most appealing multidisciplinary research areas in Artificial Intelligence (AI) is Sentiment Analysis (SA). Due to the intricate and complementary interactions between several modalities, Multimodal Sentiment Analysis (MSA) is an extremely difficult work that has a wide range of applications. In the subject of multimodal sentiment analysis, numerous deep learning models and different techniques have been suggested, but they do not investigate the explicit context of words and are unable to model diverse components of a sentence. Hence, the full potential of such diverse data has not been explored. In this research, a Context-Sensitive Multi-Tier Deep Learning Framework (CS-MDF) is proposed for sentiment analysis on multimodal data. The CS-MDF uses a three-tier architecture for extracting context-sensitive information. The first tier utilizes Convolutional Neural Network (CNN) for extracting text-based features, 3D-CNN model for extracting visual features and open-Source Media Interpretation by Large feature-space Extraction (openSMILE) tool kit for audio feature extraction.The first tier focuses on extracting the unimodal features from the utterances. This level of extraction ignores context-sensitive data while determining the feature.CNNs are suitable for text data because they are particularly useful for identifying local patterns and dependencies in data.The second tier uses the features extracted from the first tier.The context-sensitive unimodal characteristics are extracted in this tier using the Bi-directional Gated Recurrent Unit (BiGRU), which is used to comprehend inter-utterance links and uncover contextual evidence.The output from tier two is combined and passed to the third tier, which fuses the features from different modalities and trains a single BiGRU model that provides the final classification.This method applies the BiGRU model to sequential data processing, using the advantages of both modalities and capturing their interdependencies.Experimental results obtained on six real-life datasets (Flickr Images dataset, Multi-View Sentiment Analysis dataset, Getty Images dataset, Balanced Twitter for Sentiment Analysis dataset, CMU-MOSI Dataset) show that the proposed CS-MDF model has achieved better performance compared with ten state-of-the-art approaches, which are validated by F1 score, precision, accuracy, and recall metrics.An ablation study is carried out on the proposed framework that demonstrates the viability of the design. The GradCAM visualization technique is applied to visualize the aligned input image-text pairs learned by the proposed CS-MDF model.</description><identifier>ISSN: 1573-7721</identifier><identifier>ISSN: 1380-7501</identifier><identifier>EISSN: 1573-7721</identifier><identifier>DOI: 10.1007/s11042-023-17601-1</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Ablation ; Artificial intelligence ; Artificial neural networks ; Computer Communication Networks ; Computer Science ; Context ; Data mining ; Data processing ; Data Structures and Information Theory ; Datasets ; Deep learning ; Feature extraction ; Machine learning ; Multidisciplinary research ; Multimedia Information Systems ; Sentiment analysis ; Special Purpose and Application-Based Systems</subject><ispartof>Multimedia tools and applications, 2024-05, Vol.83 (18), p.54249-54278</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c270t-b49305f1bc6bf86094a8678f61fd355d370ff3179c1e09c448f63dcb117ccf7e3</cites><orcidid>0000-0002-5543-7547</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11042-023-17601-1$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11042-023-17601-1$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>P, Ganesh Kumar</creatorcontrib><creatorcontrib>S, Arul Antran Vijay</creatorcontrib><creatorcontrib>V, Jothi Prakash</creatorcontrib><creatorcontrib>Paul, Anand</creatorcontrib><creatorcontrib>Nayyar, Anand</creatorcontrib><title>A context-sensitive multi-tier deep learning framework for multimodal sentiment analysis</title><title>Multimedia tools and applications</title><addtitle>Multimed Tools Appl</addtitle><description>One of the most appealing multidisciplinary research areas in Artificial Intelligence (AI) is Sentiment Analysis (SA). Due to the intricate and complementary interactions between several modalities, Multimodal Sentiment Analysis (MSA) is an extremely difficult work that has a wide range of applications. In the subject of multimodal sentiment analysis, numerous deep learning models and different techniques have been suggested, but they do not investigate the explicit context of words and are unable to model diverse components of a sentence. Hence, the full potential of such diverse data has not been explored. In this research, a Context-Sensitive Multi-Tier Deep Learning Framework (CS-MDF) is proposed for sentiment analysis on multimodal data. The CS-MDF uses a three-tier architecture for extracting context-sensitive information. The first tier utilizes Convolutional Neural Network (CNN) for extracting text-based features, 3D-CNN model for extracting visual features and open-Source Media Interpretation by Large feature-space Extraction (openSMILE) tool kit for audio feature extraction.The first tier focuses on extracting the unimodal features from the utterances. This level of extraction ignores context-sensitive data while determining the feature.CNNs are suitable for text data because they are particularly useful for identifying local patterns and dependencies in data.The second tier uses the features extracted from the first tier.The context-sensitive unimodal characteristics are extracted in this tier using the Bi-directional Gated Recurrent Unit (BiGRU), which is used to comprehend inter-utterance links and uncover contextual evidence.The output from tier two is combined and passed to the third tier, which fuses the features from different modalities and trains a single BiGRU model that provides the final classification.This method applies the BiGRU model to sequential data processing, using the advantages of both modalities and capturing their interdependencies.Experimental results obtained on six real-life datasets (Flickr Images dataset, Multi-View Sentiment Analysis dataset, Getty Images dataset, Balanced Twitter for Sentiment Analysis dataset, CMU-MOSI Dataset) show that the proposed CS-MDF model has achieved better performance compared with ten state-of-the-art approaches, which are validated by F1 score, precision, accuracy, and recall metrics.An ablation study is carried out on the proposed framework that demonstrates the viability of the design. The GradCAM visualization technique is applied to visualize the aligned input image-text pairs learned by the proposed CS-MDF model.</description><subject>Ablation</subject><subject>Artificial intelligence</subject><subject>Artificial neural networks</subject><subject>Computer Communication Networks</subject><subject>Computer Science</subject><subject>Context</subject><subject>Data mining</subject><subject>Data processing</subject><subject>Data Structures and Information Theory</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>Feature extraction</subject><subject>Machine learning</subject><subject>Multidisciplinary research</subject><subject>Multimedia Information Systems</subject><subject>Sentiment analysis</subject><subject>Special Purpose and Application-Based Systems</subject><issn>1573-7721</issn><issn>1380-7501</issn><issn>1573-7721</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kE9LxDAQxYMouK5-AU8Bz9FM0ibtcVn8BwteFLyFNE2Wrm2zJll1v73RCnryMjMw7z1mfgidA70ESuVVBKAFI5RxAlJQIHCAZlBKTqRkcPhnPkYnMW4oBVGyYoaeF9j4MdmPRKIdY5e6N4uHXZ86kjobcGvtFvdWh7Eb19gFPdh3H16w82GSDb7VPc7ePOaC9aj7feziKTpyuo_27KfP0dPN9ePyjqwebu-XixUxTNJEmqLmtHTQGNG4StC60JWQlRPgWl6WLZfUOQ6yNmBpbYoir3hrGgBpjJOWz9HFlLsN_nVnY1Ibvwv5iKhycP6xZpXIKjapTPAxBuvUNnSDDnsFVH0RVBNBlQmqb4IKsolPppjF49qG3-h_XJ9OonTL</recordid><startdate>20240501</startdate><enddate>20240501</enddate><creator>P, Ganesh Kumar</creator><creator>S, Arul Antran Vijay</creator><creator>V, Jothi Prakash</creator><creator>Paul, Anand</creator><creator>Nayyar, Anand</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-5543-7547</orcidid></search><sort><creationdate>20240501</creationdate><title>A context-sensitive multi-tier deep learning framework for multimodal sentiment analysis</title><author>P, Ganesh Kumar ; S, Arul Antran Vijay ; V, Jothi Prakash ; Paul, Anand ; Nayyar, Anand</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-b49305f1bc6bf86094a8678f61fd355d370ff3179c1e09c448f63dcb117ccf7e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Ablation</topic><topic>Artificial intelligence</topic><topic>Artificial neural networks</topic><topic>Computer Communication Networks</topic><topic>Computer Science</topic><topic>Context</topic><topic>Data mining</topic><topic>Data processing</topic><topic>Data Structures and Information Theory</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>Feature extraction</topic><topic>Machine learning</topic><topic>Multidisciplinary research</topic><topic>Multimedia Information Systems</topic><topic>Sentiment analysis</topic><topic>Special Purpose and Application-Based Systems</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>P, Ganesh Kumar</creatorcontrib><creatorcontrib>S, Arul Antran Vijay</creatorcontrib><creatorcontrib>V, Jothi Prakash</creatorcontrib><creatorcontrib>Paul, Anand</creatorcontrib><creatorcontrib>Nayyar, Anand</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Multimedia tools and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>P, Ganesh Kumar</au><au>S, Arul Antran Vijay</au><au>V, Jothi Prakash</au><au>Paul, Anand</au><au>Nayyar, Anand</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A context-sensitive multi-tier deep learning framework for multimodal sentiment analysis</atitle><jtitle>Multimedia tools and applications</jtitle><stitle>Multimed Tools Appl</stitle><date>2024-05-01</date><risdate>2024</risdate><volume>83</volume><issue>18</issue><spage>54249</spage><epage>54278</epage><pages>54249-54278</pages><issn>1573-7721</issn><issn>1380-7501</issn><eissn>1573-7721</eissn><abstract>One of the most appealing multidisciplinary research areas in Artificial Intelligence (AI) is Sentiment Analysis (SA). Due to the intricate and complementary interactions between several modalities, Multimodal Sentiment Analysis (MSA) is an extremely difficult work that has a wide range of applications. In the subject of multimodal sentiment analysis, numerous deep learning models and different techniques have been suggested, but they do not investigate the explicit context of words and are unable to model diverse components of a sentence. Hence, the full potential of such diverse data has not been explored. In this research, a Context-Sensitive Multi-Tier Deep Learning Framework (CS-MDF) is proposed for sentiment analysis on multimodal data. The CS-MDF uses a three-tier architecture for extracting context-sensitive information. The first tier utilizes Convolutional Neural Network (CNN) for extracting text-based features, 3D-CNN model for extracting visual features and open-Source Media Interpretation by Large feature-space Extraction (openSMILE) tool kit for audio feature extraction.The first tier focuses on extracting the unimodal features from the utterances. This level of extraction ignores context-sensitive data while determining the feature.CNNs are suitable for text data because they are particularly useful for identifying local patterns and dependencies in data.The second tier uses the features extracted from the first tier.The context-sensitive unimodal characteristics are extracted in this tier using the Bi-directional Gated Recurrent Unit (BiGRU), which is used to comprehend inter-utterance links and uncover contextual evidence.The output from tier two is combined and passed to the third tier, which fuses the features from different modalities and trains a single BiGRU model that provides the final classification.This method applies the BiGRU model to sequential data processing, using the advantages of both modalities and capturing their interdependencies.Experimental results obtained on six real-life datasets (Flickr Images dataset, Multi-View Sentiment Analysis dataset, Getty Images dataset, Balanced Twitter for Sentiment Analysis dataset, CMU-MOSI Dataset) show that the proposed CS-MDF model has achieved better performance compared with ten state-of-the-art approaches, which are validated by F1 score, precision, accuracy, and recall metrics.An ablation study is carried out on the proposed framework that demonstrates the viability of the design. The GradCAM visualization technique is applied to visualize the aligned input image-text pairs learned by the proposed CS-MDF model.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11042-023-17601-1</doi><tpages>30</tpages><orcidid>https://orcid.org/0000-0002-5543-7547</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1573-7721 |
ispartof | Multimedia tools and applications, 2024-05, Vol.83 (18), p.54249-54278 |
issn | 1573-7721 1380-7501 1573-7721 |
language | eng |
recordid | cdi_proquest_journals_3055249286 |
source | SpringerLink Journals |
subjects | Ablation Artificial intelligence Artificial neural networks Computer Communication Networks Computer Science Context Data mining Data processing Data Structures and Information Theory Datasets Deep learning Feature extraction Machine learning Multidisciplinary research Multimedia Information Systems Sentiment analysis Special Purpose and Application-Based Systems |
title | A context-sensitive multi-tier deep learning framework for multimodal sentiment analysis |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T23%3A05%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20context-sensitive%20multi-tier%20deep%20learning%20framework%20for%20multimodal%20sentiment%20analysis&rft.jtitle=Multimedia%20tools%20and%20applications&rft.au=P,%20Ganesh%20Kumar&rft.date=2024-05-01&rft.volume=83&rft.issue=18&rft.spage=54249&rft.epage=54278&rft.pages=54249-54278&rft.issn=1573-7721&rft.eissn=1573-7721&rft_id=info:doi/10.1007/s11042-023-17601-1&rft_dat=%3Cproquest_cross%3E3055249286%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3055249286&rft_id=info:pmid/&rfr_iscdi=true |