Multimodal deep learning emotion classification method based on voice and video

The invention discloses a multi-mode deep learning sentiment classification method based on voice and video. The method comprises the following steps: step 1, acquiring a voice and video dual-mode sentiment data set; 2, preprocessing the voice data and the video data respectively; 3, taking the prep...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	JIANG YINHE, ZOU YUJIAN, YING NA, ZHAO JIAN
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	ACOUSTICS CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	JIANG YINHE ZOU YUJIAN YING NA ZHAO JIAN
description	The invention discloses a multi-mode deep learning sentiment classification method based on voice and video. The method comprises the following steps: step 1, acquiring a voice and video dual-mode sentiment data set; 2, preprocessing the voice data and the video data respectively; 3, taking the preprocessed spectrogram and the preprocessed video image as input, and extracting a voice emotion feature vector and a video emotion feature vector through a voice feature extraction network and a video emotion feature extraction network respectively; and step 4, splicing the extracted voice emotion feature vector faaudio and the extracted video emotion feature vector fvideo to obtain a fused emotion feature vector fe, taking the fused feature fe as input, and classifying emotions through a full-connection neural network to obtain emotion tags. According to the method, emotion feature extraction is carried out on two different modes of data of voice and video, then the obtained features are spliced, and finally emotio
format	Patent
fullrecord	<record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_CN118115911A</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>CN118115911A</sourcerecordid><originalsourceid>FETCH-epo_espacenet_CN118115911A3</originalsourceid><addsrcrecordid>eNqNyjEKAjEQRuFtLES9w3gAi0EELWVRbNTGfhmTf92BJBNM3PMr4gGsHh-8aXM9v0LVaF4CeSBTgDyTpgchWlVL5IKUor06-TKiDubpLgWePh5NHUiSp1E9bN5MegkFi19nzfJ4uLWnFbJ1KFkcEmrXXpi3zJsd8379z_MGcZQ2-A</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Multimodal deep learning emotion classification method based on voice and video</title><source>esp@cenet</source><creator>JIANG YINHE ; ZOU YUJIAN ; YING NA ; ZHAO JIAN</creator><creatorcontrib>JIANG YINHE ; ZOU YUJIAN ; YING NA ; ZHAO JIAN</creatorcontrib><description>The invention discloses a multi-mode deep learning sentiment classification method based on voice and video. The method comprises the following steps: step 1, acquiring a voice and video dual-mode sentiment data set; 2, preprocessing the voice data and the video data respectively; 3, taking the preprocessed spectrogram and the preprocessed video image as input, and extracting a voice emotion feature vector and a video emotion feature vector through a voice feature extraction network and a video emotion feature extraction network respectively; and step 4, splicing the extracted voice emotion feature vector faaudio and the extracted video emotion feature vector fvideo to obtain a fused emotion feature vector fe, taking the fused feature fe as input, and classifying emotions through a full-connection neural network to obtain emotion tags. According to the method, emotion feature extraction is carried out on two different modes of data of voice and video, then the obtained features are spliced, and finally emotio</description><language>chi ; eng</language><subject>ACOUSTICS ; CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; COUNTING ; MUSICAL INSTRUMENTS ; PHYSICS ; SPEECH ANALYSIS OR SYNTHESIS ; SPEECH OR AUDIO CODING OR DECODING ; SPEECH OR VOICE PROCESSING ; SPEECH RECOGNITION</subject><creationdate>2024</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20240531&DB=EPODOC&CC=CN&NR=118115911A$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,780,885,25564,76547</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20240531&DB=EPODOC&CC=CN&NR=118115911A$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>JIANG YINHE</creatorcontrib><creatorcontrib>ZOU YUJIAN</creatorcontrib><creatorcontrib>YING NA</creatorcontrib><creatorcontrib>ZHAO JIAN</creatorcontrib><title>Multimodal deep learning emotion classification method based on voice and video</title><description>The invention discloses a multi-mode deep learning sentiment classification method based on voice and video. The method comprises the following steps: step 1, acquiring a voice and video dual-mode sentiment data set; 2, preprocessing the voice data and the video data respectively; 3, taking the preprocessed spectrogram and the preprocessed video image as input, and extracting a voice emotion feature vector and a video emotion feature vector through a voice feature extraction network and a video emotion feature extraction network respectively; and step 4, splicing the extracted voice emotion feature vector faaudio and the extracted video emotion feature vector fvideo to obtain a fused emotion feature vector fe, taking the fused feature fe as input, and classifying emotions through a full-connection neural network to obtain emotion tags. According to the method, emotion feature extraction is carried out on two different modes of data of voice and video, then the obtained features are spliced, and finally emotio</description><subject>ACOUSTICS</subject><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>MUSICAL INSTRUMENTS</subject><subject>PHYSICS</subject><subject>SPEECH ANALYSIS OR SYNTHESIS</subject><subject>SPEECH OR AUDIO CODING OR DECODING</subject><subject>SPEECH OR VOICE PROCESSING</subject><subject>SPEECH RECOGNITION</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2024</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNqNyjEKAjEQRuFtLES9w3gAi0EELWVRbNTGfhmTf92BJBNM3PMr4gGsHh-8aXM9v0LVaF4CeSBTgDyTpgchWlVL5IKUor06-TKiDubpLgWePh5NHUiSp1E9bN5MegkFi19nzfJ4uLWnFbJ1KFkcEmrXXpi3zJsd8379z_MGcZQ2-A</recordid><startdate>20240531</startdate><enddate>20240531</enddate><creator>JIANG YINHE</creator><creator>ZOU YUJIAN</creator><creator>YING NA</creator><creator>ZHAO JIAN</creator><scope>EVB</scope></search><sort><creationdate>20240531</creationdate><title>Multimodal deep learning emotion classification method based on voice and video</title><author>JIANG YINHE ; ZOU YUJIAN ; YING NA ; ZHAO JIAN</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_CN118115911A3</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>chi ; eng</language><creationdate>2024</creationdate><topic>ACOUSTICS</topic><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>MUSICAL INSTRUMENTS</topic><topic>PHYSICS</topic><topic>SPEECH ANALYSIS OR SYNTHESIS</topic><topic>SPEECH OR AUDIO CODING OR DECODING</topic><topic>SPEECH OR VOICE PROCESSING</topic><topic>SPEECH RECOGNITION</topic><toplevel>online_resources</toplevel><creatorcontrib>JIANG YINHE</creatorcontrib><creatorcontrib>ZOU YUJIAN</creatorcontrib><creatorcontrib>YING NA</creatorcontrib><creatorcontrib>ZHAO JIAN</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>JIANG YINHE</au><au>ZOU YUJIAN</au><au>YING NA</au><au>ZHAO JIAN</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Multimodal deep learning emotion classification method based on voice and video</title><date>2024-05-31</date><risdate>2024</risdate><abstract>The invention discloses a multi-mode deep learning sentiment classification method based on voice and video. The method comprises the following steps: step 1, acquiring a voice and video dual-mode sentiment data set; 2, preprocessing the voice data and the video data respectively; 3, taking the preprocessed spectrogram and the preprocessed video image as input, and extracting a voice emotion feature vector and a video emotion feature vector through a voice feature extraction network and a video emotion feature extraction network respectively; and step 4, splicing the extracted voice emotion feature vector faaudio and the extracted video emotion feature vector fvideo to obtain a fused emotion feature vector fe, taking the fused feature fe as input, and classifying emotions through a full-connection neural network to obtain emotion tags. According to the method, emotion feature extraction is carried out on two different modes of data of voice and video, then the obtained features are spliced, and finally emotio</abstract><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier
ispartof
issn
language	chi ; eng
recordid	cdi_epo_espacenet_CN118115911A
source	esp@cenet
subjects	ACOUSTICS CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
title	Multimodal deep learning emotion classification method based on voice and video
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T09%3A57%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=JIANG%20YINHE&rft.date=2024-05-31&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3ECN118115911A%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true