Multimodal deep learning emotion classification method based on voice and video

The invention discloses a multi-mode deep learning sentiment classification method based on voice and video. The method comprises the following steps: step 1, acquiring a voice and video dual-mode sentiment data set; 2, preprocessing the voice data and the video data respectively; 3, taking the prep...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: JIANG YINHE, ZOU YUJIAN, YING NA, ZHAO JIAN
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator JIANG YINHE
ZOU YUJIAN
YING NA
ZHAO JIAN
description The invention discloses a multi-mode deep learning sentiment classification method based on voice and video. The method comprises the following steps: step 1, acquiring a voice and video dual-mode sentiment data set; 2, preprocessing the voice data and the video data respectively; 3, taking the preprocessed spectrogram and the preprocessed video image as input, and extracting a voice emotion feature vector and a video emotion feature vector through a voice feature extraction network and a video emotion feature extraction network respectively; and step 4, splicing the extracted voice emotion feature vector faaudio and the extracted video emotion feature vector fvideo to obtain a fused emotion feature vector fe, taking the fused feature fe as input, and classifying emotions through a full-connection neural network to obtain emotion tags. According to the method, emotion feature extraction is carried out on two different modes of data of voice and video, then the obtained features are spliced, and finally emotio
format Patent
fullrecord <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_CN118115911A</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>CN118115911A</sourcerecordid><originalsourceid>FETCH-epo_espacenet_CN118115911A3</originalsourceid><addsrcrecordid>eNqNyjEKAjEQRuFtLES9w3gAi0EELWVRbNTGfhmTf92BJBNM3PMr4gGsHh-8aXM9v0LVaF4CeSBTgDyTpgchWlVL5IKUor06-TKiDubpLgWePh5NHUiSp1E9bN5MegkFi19nzfJ4uLWnFbJ1KFkcEmrXXpi3zJsd8379z_MGcZQ2-A</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Multimodal deep learning emotion classification method based on voice and video</title><source>esp@cenet</source><creator>JIANG YINHE ; ZOU YUJIAN ; YING NA ; ZHAO JIAN</creator><creatorcontrib>JIANG YINHE ; ZOU YUJIAN ; YING NA ; ZHAO JIAN</creatorcontrib><description>The invention discloses a multi-mode deep learning sentiment classification method based on voice and video. The method comprises the following steps: step 1, acquiring a voice and video dual-mode sentiment data set; 2, preprocessing the voice data and the video data respectively; 3, taking the preprocessed spectrogram and the preprocessed video image as input, and extracting a voice emotion feature vector and a video emotion feature vector through a voice feature extraction network and a video emotion feature extraction network respectively; and step 4, splicing the extracted voice emotion feature vector faaudio and the extracted video emotion feature vector fvideo to obtain a fused emotion feature vector fe, taking the fused feature fe as input, and classifying emotions through a full-connection neural network to obtain emotion tags. According to the method, emotion feature extraction is carried out on two different modes of data of voice and video, then the obtained features are spliced, and finally emotio</description><language>chi ; eng</language><subject>ACOUSTICS ; CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; COUNTING ; MUSICAL INSTRUMENTS ; PHYSICS ; SPEECH ANALYSIS OR SYNTHESIS ; SPEECH OR AUDIO CODING OR DECODING ; SPEECH OR VOICE PROCESSING ; SPEECH RECOGNITION</subject><creationdate>2024</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20240531&amp;DB=EPODOC&amp;CC=CN&amp;NR=118115911A$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,780,885,25564,76547</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20240531&amp;DB=EPODOC&amp;CC=CN&amp;NR=118115911A$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>JIANG YINHE</creatorcontrib><creatorcontrib>ZOU YUJIAN</creatorcontrib><creatorcontrib>YING NA</creatorcontrib><creatorcontrib>ZHAO JIAN</creatorcontrib><title>Multimodal deep learning emotion classification method based on voice and video</title><description>The invention discloses a multi-mode deep learning sentiment classification method based on voice and video. The method comprises the following steps: step 1, acquiring a voice and video dual-mode sentiment data set; 2, preprocessing the voice data and the video data respectively; 3, taking the preprocessed spectrogram and the preprocessed video image as input, and extracting a voice emotion feature vector and a video emotion feature vector through a voice feature extraction network and a video emotion feature extraction network respectively; and step 4, splicing the extracted voice emotion feature vector faaudio and the extracted video emotion feature vector fvideo to obtain a fused emotion feature vector fe, taking the fused feature fe as input, and classifying emotions through a full-connection neural network to obtain emotion tags. According to the method, emotion feature extraction is carried out on two different modes of data of voice and video, then the obtained features are spliced, and finally emotio</description><subject>ACOUSTICS</subject><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>MUSICAL INSTRUMENTS</subject><subject>PHYSICS</subject><subject>SPEECH ANALYSIS OR SYNTHESIS</subject><subject>SPEECH OR AUDIO CODING OR DECODING</subject><subject>SPEECH OR VOICE PROCESSING</subject><subject>SPEECH RECOGNITION</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2024</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNqNyjEKAjEQRuFtLES9w3gAi0EELWVRbNTGfhmTf92BJBNM3PMr4gGsHh-8aXM9v0LVaF4CeSBTgDyTpgchWlVL5IKUor06-TKiDubpLgWePh5NHUiSp1E9bN5MegkFi19nzfJ4uLWnFbJ1KFkcEmrXXpi3zJsd8379z_MGcZQ2-A</recordid><startdate>20240531</startdate><enddate>20240531</enddate><creator>JIANG YINHE</creator><creator>ZOU YUJIAN</creator><creator>YING NA</creator><creator>ZHAO JIAN</creator><scope>EVB</scope></search><sort><creationdate>20240531</creationdate><title>Multimodal deep learning emotion classification method based on voice and video</title><author>JIANG YINHE ; ZOU YUJIAN ; YING NA ; ZHAO JIAN</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_CN118115911A3</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>chi ; eng</language><creationdate>2024</creationdate><topic>ACOUSTICS</topic><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>MUSICAL INSTRUMENTS</topic><topic>PHYSICS</topic><topic>SPEECH ANALYSIS OR SYNTHESIS</topic><topic>SPEECH OR AUDIO CODING OR DECODING</topic><topic>SPEECH OR VOICE PROCESSING</topic><topic>SPEECH RECOGNITION</topic><toplevel>online_resources</toplevel><creatorcontrib>JIANG YINHE</creatorcontrib><creatorcontrib>ZOU YUJIAN</creatorcontrib><creatorcontrib>YING NA</creatorcontrib><creatorcontrib>ZHAO JIAN</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>JIANG YINHE</au><au>ZOU YUJIAN</au><au>YING NA</au><au>ZHAO JIAN</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Multimodal deep learning emotion classification method based on voice and video</title><date>2024-05-31</date><risdate>2024</risdate><abstract>The invention discloses a multi-mode deep learning sentiment classification method based on voice and video. The method comprises the following steps: step 1, acquiring a voice and video dual-mode sentiment data set; 2, preprocessing the voice data and the video data respectively; 3, taking the preprocessed spectrogram and the preprocessed video image as input, and extracting a voice emotion feature vector and a video emotion feature vector through a voice feature extraction network and a video emotion feature extraction network respectively; and step 4, splicing the extracted voice emotion feature vector faaudio and the extracted video emotion feature vector fvideo to obtain a fused emotion feature vector fe, taking the fused feature fe as input, and classifying emotions through a full-connection neural network to obtain emotion tags. According to the method, emotion feature extraction is carried out on two different modes of data of voice and video, then the obtained features are spliced, and finally emotio</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language chi ; eng
recordid cdi_epo_espacenet_CN118115911A
source esp@cenet
subjects ACOUSTICS
CALCULATING
COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
COMPUTING
COUNTING
MUSICAL INSTRUMENTS
PHYSICS
SPEECH ANALYSIS OR SYNTHESIS
SPEECH OR AUDIO CODING OR DECODING
SPEECH OR VOICE PROCESSING
SPEECH RECOGNITION
title Multimodal deep learning emotion classification method based on voice and video
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T09%3A57%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=JIANG%20YINHE&rft.date=2024-05-31&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3ECN118115911A%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true