A context-sensitive multi-tier deep learning framework for multimodal sentiment analysis

One of the most appealing multidisciplinary research areas in Artificial Intelligence (AI) is Sentiment Analysis (SA). Due to the intricate and complementary interactions between several modalities, Multimodal Sentiment Analysis (MSA) is an extremely difficult work that has a wide range of applicati...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Multimedia tools and applications 2024-05, Vol.83 (18), p.54249-54278
Hauptverfasser: P, Ganesh Kumar, S, Arul Antran Vijay, V, Jothi Prakash, Paul, Anand, Nayyar, Anand
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 54278
container_issue 18
container_start_page 54249
container_title Multimedia tools and applications
container_volume 83
creator P, Ganesh Kumar
S, Arul Antran Vijay
V, Jothi Prakash
Paul, Anand
Nayyar, Anand
description One of the most appealing multidisciplinary research areas in Artificial Intelligence (AI) is Sentiment Analysis (SA). Due to the intricate and complementary interactions between several modalities, Multimodal Sentiment Analysis (MSA) is an extremely difficult work that has a wide range of applications. In the subject of multimodal sentiment analysis, numerous deep learning models and different techniques have been suggested, but they do not investigate the explicit context of words and are unable to model diverse components of a sentence. Hence, the full potential of such diverse data has not been explored. In this research, a Context-Sensitive Multi-Tier Deep Learning Framework (CS-MDF) is proposed for sentiment analysis on multimodal data. The CS-MDF uses a three-tier architecture for extracting context-sensitive information. The first tier utilizes Convolutional Neural Network (CNN) for extracting text-based features, 3D-CNN model for extracting visual features and open-Source Media Interpretation by Large feature-space Extraction (openSMILE) tool kit for audio feature extraction.The first tier focuses on extracting the unimodal features from the utterances. This level of extraction ignores context-sensitive data while determining the feature.CNNs are suitable for text data because they are particularly useful for identifying local patterns and dependencies in data.The second tier uses the features extracted from the first tier.The context-sensitive unimodal characteristics are extracted in this tier using the Bi-directional Gated Recurrent Unit (BiGRU), which is used to comprehend inter-utterance links and uncover contextual evidence.The output from tier two is combined and passed to the third tier, which fuses the features from different modalities and trains a single BiGRU model that provides the final classification.This method applies the BiGRU model to sequential data processing, using the advantages of both modalities and capturing their interdependencies.Experimental results obtained on six real-life datasets (Flickr Images dataset, Multi-View Sentiment Analysis dataset, Getty Images dataset, Balanced Twitter for Sentiment Analysis dataset, CMU-MOSI Dataset) show that the proposed CS-MDF model has achieved better performance compared with ten state-of-the-art approaches, which are validated by F1 score, precision, accuracy, and recall metrics.An ablation study is carried out on the proposed framework that demonstrates the viability of the desig
doi_str_mv 10.1007/s11042-023-17601-1
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3055249286</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3055249286</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-b49305f1bc6bf86094a8678f61fd355d370ff3179c1e09c448f63dcb117ccf7e3</originalsourceid><addsrcrecordid>eNp9kE9LxDAQxYMouK5-AU8Bz9FM0ibtcVn8BwteFLyFNE2Wrm2zJll1v73RCnryMjMw7z1mfgidA70ESuVVBKAFI5RxAlJQIHCAZlBKTqRkcPhnPkYnMW4oBVGyYoaeF9j4MdmPRKIdY5e6N4uHXZ86kjobcGvtFvdWh7Eb19gFPdh3H16w82GSDb7VPc7ePOaC9aj7feziKTpyuo_27KfP0dPN9ePyjqwebu-XixUxTNJEmqLmtHTQGNG4StC60JWQlRPgWl6WLZfUOQ6yNmBpbYoir3hrGgBpjJOWz9HFlLsN_nVnY1Ibvwv5iKhycP6xZpXIKjapTPAxBuvUNnSDDnsFVH0RVBNBlQmqb4IKsolPppjF49qG3-h_XJ9OonTL</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3055249286</pqid></control><display><type>article</type><title>A context-sensitive multi-tier deep learning framework for multimodal sentiment analysis</title><source>SpringerLink Journals</source><creator>P, Ganesh Kumar ; S, Arul Antran Vijay ; V, Jothi Prakash ; Paul, Anand ; Nayyar, Anand</creator><creatorcontrib>P, Ganesh Kumar ; S, Arul Antran Vijay ; V, Jothi Prakash ; Paul, Anand ; Nayyar, Anand</creatorcontrib><description>One of the most appealing multidisciplinary research areas in Artificial Intelligence (AI) is Sentiment Analysis (SA). Due to the intricate and complementary interactions between several modalities, Multimodal Sentiment Analysis (MSA) is an extremely difficult work that has a wide range of applications. In the subject of multimodal sentiment analysis, numerous deep learning models and different techniques have been suggested, but they do not investigate the explicit context of words and are unable to model diverse components of a sentence. Hence, the full potential of such diverse data has not been explored. In this research, a Context-Sensitive Multi-Tier Deep Learning Framework (CS-MDF) is proposed for sentiment analysis on multimodal data. The CS-MDF uses a three-tier architecture for extracting context-sensitive information. The first tier utilizes Convolutional Neural Network (CNN) for extracting text-based features, 3D-CNN model for extracting visual features and open-Source Media Interpretation by Large feature-space Extraction (openSMILE) tool kit for audio feature extraction.The first tier focuses on extracting the unimodal features from the utterances. This level of extraction ignores context-sensitive data while determining the feature.CNNs are suitable for text data because they are particularly useful for identifying local patterns and dependencies in data.The second tier uses the features extracted from the first tier.The context-sensitive unimodal characteristics are extracted in this tier using the Bi-directional Gated Recurrent Unit (BiGRU), which is used to comprehend inter-utterance links and uncover contextual evidence.The output from tier two is combined and passed to the third tier, which fuses the features from different modalities and trains a single BiGRU model that provides the final classification.This method applies the BiGRU model to sequential data processing, using the advantages of both modalities and capturing their interdependencies.Experimental results obtained on six real-life datasets (Flickr Images dataset, Multi-View Sentiment Analysis dataset, Getty Images dataset, Balanced Twitter for Sentiment Analysis dataset, CMU-MOSI Dataset) show that the proposed CS-MDF model has achieved better performance compared with ten state-of-the-art approaches, which are validated by F1 score, precision, accuracy, and recall metrics.An ablation study is carried out on the proposed framework that demonstrates the viability of the design. The GradCAM visualization technique is applied to visualize the aligned input image-text pairs learned by the proposed CS-MDF model.</description><identifier>ISSN: 1573-7721</identifier><identifier>ISSN: 1380-7501</identifier><identifier>EISSN: 1573-7721</identifier><identifier>DOI: 10.1007/s11042-023-17601-1</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Ablation ; Artificial intelligence ; Artificial neural networks ; Computer Communication Networks ; Computer Science ; Context ; Data mining ; Data processing ; Data Structures and Information Theory ; Datasets ; Deep learning ; Feature extraction ; Machine learning ; Multidisciplinary research ; Multimedia Information Systems ; Sentiment analysis ; Special Purpose and Application-Based Systems</subject><ispartof>Multimedia tools and applications, 2024-05, Vol.83 (18), p.54249-54278</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c270t-b49305f1bc6bf86094a8678f61fd355d370ff3179c1e09c448f63dcb117ccf7e3</cites><orcidid>0000-0002-5543-7547</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11042-023-17601-1$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11042-023-17601-1$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>P, Ganesh Kumar</creatorcontrib><creatorcontrib>S, Arul Antran Vijay</creatorcontrib><creatorcontrib>V, Jothi Prakash</creatorcontrib><creatorcontrib>Paul, Anand</creatorcontrib><creatorcontrib>Nayyar, Anand</creatorcontrib><title>A context-sensitive multi-tier deep learning framework for multimodal sentiment analysis</title><title>Multimedia tools and applications</title><addtitle>Multimed Tools Appl</addtitle><description>One of the most appealing multidisciplinary research areas in Artificial Intelligence (AI) is Sentiment Analysis (SA). Due to the intricate and complementary interactions between several modalities, Multimodal Sentiment Analysis (MSA) is an extremely difficult work that has a wide range of applications. In the subject of multimodal sentiment analysis, numerous deep learning models and different techniques have been suggested, but they do not investigate the explicit context of words and are unable to model diverse components of a sentence. Hence, the full potential of such diverse data has not been explored. In this research, a Context-Sensitive Multi-Tier Deep Learning Framework (CS-MDF) is proposed for sentiment analysis on multimodal data. The CS-MDF uses a three-tier architecture for extracting context-sensitive information. The first tier utilizes Convolutional Neural Network (CNN) for extracting text-based features, 3D-CNN model for extracting visual features and open-Source Media Interpretation by Large feature-space Extraction (openSMILE) tool kit for audio feature extraction.The first tier focuses on extracting the unimodal features from the utterances. This level of extraction ignores context-sensitive data while determining the feature.CNNs are suitable for text data because they are particularly useful for identifying local patterns and dependencies in data.The second tier uses the features extracted from the first tier.The context-sensitive unimodal characteristics are extracted in this tier using the Bi-directional Gated Recurrent Unit (BiGRU), which is used to comprehend inter-utterance links and uncover contextual evidence.The output from tier two is combined and passed to the third tier, which fuses the features from different modalities and trains a single BiGRU model that provides the final classification.This method applies the BiGRU model to sequential data processing, using the advantages of both modalities and capturing their interdependencies.Experimental results obtained on six real-life datasets (Flickr Images dataset, Multi-View Sentiment Analysis dataset, Getty Images dataset, Balanced Twitter for Sentiment Analysis dataset, CMU-MOSI Dataset) show that the proposed CS-MDF model has achieved better performance compared with ten state-of-the-art approaches, which are validated by F1 score, precision, accuracy, and recall metrics.An ablation study is carried out on the proposed framework that demonstrates the viability of the design. The GradCAM visualization technique is applied to visualize the aligned input image-text pairs learned by the proposed CS-MDF model.</description><subject>Ablation</subject><subject>Artificial intelligence</subject><subject>Artificial neural networks</subject><subject>Computer Communication Networks</subject><subject>Computer Science</subject><subject>Context</subject><subject>Data mining</subject><subject>Data processing</subject><subject>Data Structures and Information Theory</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>Feature extraction</subject><subject>Machine learning</subject><subject>Multidisciplinary research</subject><subject>Multimedia Information Systems</subject><subject>Sentiment analysis</subject><subject>Special Purpose and Application-Based Systems</subject><issn>1573-7721</issn><issn>1380-7501</issn><issn>1573-7721</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kE9LxDAQxYMouK5-AU8Bz9FM0ibtcVn8BwteFLyFNE2Wrm2zJll1v73RCnryMjMw7z1mfgidA70ESuVVBKAFI5RxAlJQIHCAZlBKTqRkcPhnPkYnMW4oBVGyYoaeF9j4MdmPRKIdY5e6N4uHXZ86kjobcGvtFvdWh7Eb19gFPdh3H16w82GSDb7VPc7ePOaC9aj7feziKTpyuo_27KfP0dPN9ePyjqwebu-XixUxTNJEmqLmtHTQGNG4StC60JWQlRPgWl6WLZfUOQ6yNmBpbYoir3hrGgBpjJOWz9HFlLsN_nVnY1Ibvwv5iKhycP6xZpXIKjapTPAxBuvUNnSDDnsFVH0RVBNBlQmqb4IKsolPppjF49qG3-h_XJ9OonTL</recordid><startdate>20240501</startdate><enddate>20240501</enddate><creator>P, Ganesh Kumar</creator><creator>S, Arul Antran Vijay</creator><creator>V, Jothi Prakash</creator><creator>Paul, Anand</creator><creator>Nayyar, Anand</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-5543-7547</orcidid></search><sort><creationdate>20240501</creationdate><title>A context-sensitive multi-tier deep learning framework for multimodal sentiment analysis</title><author>P, Ganesh Kumar ; S, Arul Antran Vijay ; V, Jothi Prakash ; Paul, Anand ; Nayyar, Anand</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-b49305f1bc6bf86094a8678f61fd355d370ff3179c1e09c448f63dcb117ccf7e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Ablation</topic><topic>Artificial intelligence</topic><topic>Artificial neural networks</topic><topic>Computer Communication Networks</topic><topic>Computer Science</topic><topic>Context</topic><topic>Data mining</topic><topic>Data processing</topic><topic>Data Structures and Information Theory</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>Feature extraction</topic><topic>Machine learning</topic><topic>Multidisciplinary research</topic><topic>Multimedia Information Systems</topic><topic>Sentiment analysis</topic><topic>Special Purpose and Application-Based Systems</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>P, Ganesh Kumar</creatorcontrib><creatorcontrib>S, Arul Antran Vijay</creatorcontrib><creatorcontrib>V, Jothi Prakash</creatorcontrib><creatorcontrib>Paul, Anand</creatorcontrib><creatorcontrib>Nayyar, Anand</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Multimedia tools and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>P, Ganesh Kumar</au><au>S, Arul Antran Vijay</au><au>V, Jothi Prakash</au><au>Paul, Anand</au><au>Nayyar, Anand</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A context-sensitive multi-tier deep learning framework for multimodal sentiment analysis</atitle><jtitle>Multimedia tools and applications</jtitle><stitle>Multimed Tools Appl</stitle><date>2024-05-01</date><risdate>2024</risdate><volume>83</volume><issue>18</issue><spage>54249</spage><epage>54278</epage><pages>54249-54278</pages><issn>1573-7721</issn><issn>1380-7501</issn><eissn>1573-7721</eissn><abstract>One of the most appealing multidisciplinary research areas in Artificial Intelligence (AI) is Sentiment Analysis (SA). Due to the intricate and complementary interactions between several modalities, Multimodal Sentiment Analysis (MSA) is an extremely difficult work that has a wide range of applications. In the subject of multimodal sentiment analysis, numerous deep learning models and different techniques have been suggested, but they do not investigate the explicit context of words and are unable to model diverse components of a sentence. Hence, the full potential of such diverse data has not been explored. In this research, a Context-Sensitive Multi-Tier Deep Learning Framework (CS-MDF) is proposed for sentiment analysis on multimodal data. The CS-MDF uses a three-tier architecture for extracting context-sensitive information. The first tier utilizes Convolutional Neural Network (CNN) for extracting text-based features, 3D-CNN model for extracting visual features and open-Source Media Interpretation by Large feature-space Extraction (openSMILE) tool kit for audio feature extraction.The first tier focuses on extracting the unimodal features from the utterances. This level of extraction ignores context-sensitive data while determining the feature.CNNs are suitable for text data because they are particularly useful for identifying local patterns and dependencies in data.The second tier uses the features extracted from the first tier.The context-sensitive unimodal characteristics are extracted in this tier using the Bi-directional Gated Recurrent Unit (BiGRU), which is used to comprehend inter-utterance links and uncover contextual evidence.The output from tier two is combined and passed to the third tier, which fuses the features from different modalities and trains a single BiGRU model that provides the final classification.This method applies the BiGRU model to sequential data processing, using the advantages of both modalities and capturing their interdependencies.Experimental results obtained on six real-life datasets (Flickr Images dataset, Multi-View Sentiment Analysis dataset, Getty Images dataset, Balanced Twitter for Sentiment Analysis dataset, CMU-MOSI Dataset) show that the proposed CS-MDF model has achieved better performance compared with ten state-of-the-art approaches, which are validated by F1 score, precision, accuracy, and recall metrics.An ablation study is carried out on the proposed framework that demonstrates the viability of the design. The GradCAM visualization technique is applied to visualize the aligned input image-text pairs learned by the proposed CS-MDF model.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11042-023-17601-1</doi><tpages>30</tpages><orcidid>https://orcid.org/0000-0002-5543-7547</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1573-7721
ispartof Multimedia tools and applications, 2024-05, Vol.83 (18), p.54249-54278
issn 1573-7721
1380-7501
1573-7721
language eng
recordid cdi_proquest_journals_3055249286
source SpringerLink Journals
subjects Ablation
Artificial intelligence
Artificial neural networks
Computer Communication Networks
Computer Science
Context
Data mining
Data processing
Data Structures and Information Theory
Datasets
Deep learning
Feature extraction
Machine learning
Multidisciplinary research
Multimedia Information Systems
Sentiment analysis
Special Purpose and Application-Based Systems
title A context-sensitive multi-tier deep learning framework for multimodal sentiment analysis
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T23%3A05%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20context-sensitive%20multi-tier%20deep%20learning%20framework%20for%20multimodal%20sentiment%20analysis&rft.jtitle=Multimedia%20tools%20and%20applications&rft.au=P,%20Ganesh%20Kumar&rft.date=2024-05-01&rft.volume=83&rft.issue=18&rft.spage=54249&rft.epage=54278&rft.pages=54249-54278&rft.issn=1573-7721&rft.eissn=1573-7721&rft_id=info:doi/10.1007/s11042-023-17601-1&rft_dat=%3Cproquest_cross%3E3055249286%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3055249286&rft_id=info:pmid/&rfr_iscdi=true