Multimodal deep learning emotion classification method based on voice and video
The invention discloses a multi-mode deep learning sentiment classification method based on voice and video. The method comprises the following steps: step 1, acquiring a voice and video dual-mode sentiment data set; 2, preprocessing the voice data and the video data respectively; 3, taking the prep...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | JIANG YINHE ZOU YUJIAN YING NA ZHAO JIAN |
description | The invention discloses a multi-mode deep learning sentiment classification method based on voice and video. The method comprises the following steps: step 1, acquiring a voice and video dual-mode sentiment data set; 2, preprocessing the voice data and the video data respectively; 3, taking the preprocessed spectrogram and the preprocessed video image as input, and extracting a voice emotion feature vector and a video emotion feature vector through a voice feature extraction network and a video emotion feature extraction network respectively; and step 4, splicing the extracted voice emotion feature vector faaudio and the extracted video emotion feature vector fvideo to obtain a fused emotion feature vector fe, taking the fused feature fe as input, and classifying emotions through a full-connection neural network to obtain emotion tags. According to the method, emotion feature extraction is carried out on two different modes of data of voice and video, then the obtained features are spliced, and finally emotio |
format | Patent |
fullrecord | <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_CN118115911A</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>CN118115911A</sourcerecordid><originalsourceid>FETCH-epo_espacenet_CN118115911A3</originalsourceid><addsrcrecordid>eNqNyjEKAjEQRuFtLES9w3gAi0EELWVRbNTGfhmTf92BJBNM3PMr4gGsHh-8aXM9v0LVaF4CeSBTgDyTpgchWlVL5IKUor06-TKiDubpLgWePh5NHUiSp1E9bN5MegkFi19nzfJ4uLWnFbJ1KFkcEmrXXpi3zJsd8379z_MGcZQ2-A</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Multimodal deep learning emotion classification method based on voice and video</title><source>esp@cenet</source><creator>JIANG YINHE ; ZOU YUJIAN ; YING NA ; ZHAO JIAN</creator><creatorcontrib>JIANG YINHE ; ZOU YUJIAN ; YING NA ; ZHAO JIAN</creatorcontrib><description>The invention discloses a multi-mode deep learning sentiment classification method based on voice and video. The method comprises the following steps: step 1, acquiring a voice and video dual-mode sentiment data set; 2, preprocessing the voice data and the video data respectively; 3, taking the preprocessed spectrogram and the preprocessed video image as input, and extracting a voice emotion feature vector and a video emotion feature vector through a voice feature extraction network and a video emotion feature extraction network respectively; and step 4, splicing the extracted voice emotion feature vector faaudio and the extracted video emotion feature vector fvideo to obtain a fused emotion feature vector fe, taking the fused feature fe as input, and classifying emotions through a full-connection neural network to obtain emotion tags. According to the method, emotion feature extraction is carried out on two different modes of data of voice and video, then the obtained features are spliced, and finally emotio</description><language>chi ; eng</language><subject>ACOUSTICS ; CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; COUNTING ; MUSICAL INSTRUMENTS ; PHYSICS ; SPEECH ANALYSIS OR SYNTHESIS ; SPEECH OR AUDIO CODING OR DECODING ; SPEECH OR VOICE PROCESSING ; SPEECH RECOGNITION</subject><creationdate>2024</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20240531&DB=EPODOC&CC=CN&NR=118115911A$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,780,885,25564,76547</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20240531&DB=EPODOC&CC=CN&NR=118115911A$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>JIANG YINHE</creatorcontrib><creatorcontrib>ZOU YUJIAN</creatorcontrib><creatorcontrib>YING NA</creatorcontrib><creatorcontrib>ZHAO JIAN</creatorcontrib><title>Multimodal deep learning emotion classification method based on voice and video</title><description>The invention discloses a multi-mode deep learning sentiment classification method based on voice and video. The method comprises the following steps: step 1, acquiring a voice and video dual-mode sentiment data set; 2, preprocessing the voice data and the video data respectively; 3, taking the preprocessed spectrogram and the preprocessed video image as input, and extracting a voice emotion feature vector and a video emotion feature vector through a voice feature extraction network and a video emotion feature extraction network respectively; and step 4, splicing the extracted voice emotion feature vector faaudio and the extracted video emotion feature vector fvideo to obtain a fused emotion feature vector fe, taking the fused feature fe as input, and classifying emotions through a full-connection neural network to obtain emotion tags. According to the method, emotion feature extraction is carried out on two different modes of data of voice and video, then the obtained features are spliced, and finally emotio</description><subject>ACOUSTICS</subject><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>MUSICAL INSTRUMENTS</subject><subject>PHYSICS</subject><subject>SPEECH ANALYSIS OR SYNTHESIS</subject><subject>SPEECH OR AUDIO CODING OR DECODING</subject><subject>SPEECH OR VOICE PROCESSING</subject><subject>SPEECH RECOGNITION</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2024</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNqNyjEKAjEQRuFtLES9w3gAi0EELWVRbNTGfhmTf92BJBNM3PMr4gGsHh-8aXM9v0LVaF4CeSBTgDyTpgchWlVL5IKUor06-TKiDubpLgWePh5NHUiSp1E9bN5MegkFi19nzfJ4uLWnFbJ1KFkcEmrXXpi3zJsd8379z_MGcZQ2-A</recordid><startdate>20240531</startdate><enddate>20240531</enddate><creator>JIANG YINHE</creator><creator>ZOU YUJIAN</creator><creator>YING NA</creator><creator>ZHAO JIAN</creator><scope>EVB</scope></search><sort><creationdate>20240531</creationdate><title>Multimodal deep learning emotion classification method based on voice and video</title><author>JIANG YINHE ; ZOU YUJIAN ; YING NA ; ZHAO JIAN</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_CN118115911A3</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>chi ; eng</language><creationdate>2024</creationdate><topic>ACOUSTICS</topic><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>MUSICAL INSTRUMENTS</topic><topic>PHYSICS</topic><topic>SPEECH ANALYSIS OR SYNTHESIS</topic><topic>SPEECH OR AUDIO CODING OR DECODING</topic><topic>SPEECH OR VOICE PROCESSING</topic><topic>SPEECH RECOGNITION</topic><toplevel>online_resources</toplevel><creatorcontrib>JIANG YINHE</creatorcontrib><creatorcontrib>ZOU YUJIAN</creatorcontrib><creatorcontrib>YING NA</creatorcontrib><creatorcontrib>ZHAO JIAN</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>JIANG YINHE</au><au>ZOU YUJIAN</au><au>YING NA</au><au>ZHAO JIAN</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Multimodal deep learning emotion classification method based on voice and video</title><date>2024-05-31</date><risdate>2024</risdate><abstract>The invention discloses a multi-mode deep learning sentiment classification method based on voice and video. The method comprises the following steps: step 1, acquiring a voice and video dual-mode sentiment data set; 2, preprocessing the voice data and the video data respectively; 3, taking the preprocessed spectrogram and the preprocessed video image as input, and extracting a voice emotion feature vector and a video emotion feature vector through a voice feature extraction network and a video emotion feature extraction network respectively; and step 4, splicing the extracted voice emotion feature vector faaudio and the extracted video emotion feature vector fvideo to obtain a fused emotion feature vector fe, taking the fused feature fe as input, and classifying emotions through a full-connection neural network to obtain emotion tags. According to the method, emotion feature extraction is carried out on two different modes of data of voice and video, then the obtained features are spliced, and finally emotio</abstract><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | |
ispartof | |
issn | |
language | chi ; eng |
recordid | cdi_epo_espacenet_CN118115911A |
source | esp@cenet |
subjects | ACOUSTICS CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION |
title | Multimodal deep learning emotion classification method based on voice and video |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T09%3A57%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=JIANG%20YINHE&rft.date=2024-05-31&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3ECN118115911A%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |