Multimodal deep learning emotion classification method based on voice and video

The invention discloses a multi-mode deep learning sentiment classification method based on voice and video. The method comprises the following steps: step 1, acquiring a voice and video dual-mode sentiment data set; 2, preprocessing the voice data and the video data respectively; 3, taking the prep...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	JIANG YINHE, ZOU YUJIAN, YING NA, ZHAO JIAN
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	ACOUSTICS CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The invention discloses a multi-mode deep learning sentiment classification method based on voice and video. The method comprises the following steps: step 1, acquiring a voice and video dual-mode sentiment data set; 2, preprocessing the voice data and the video data respectively; 3, taking the preprocessed spectrogram and the preprocessed video image as input, and extracting a voice emotion feature vector and a video emotion feature vector through a voice feature extraction network and a video emotion feature extraction network respectively; and step 4, splicing the extracted voice emotion feature vector faaudio and the extracted video emotion feature vector fvideo to obtain a fused emotion feature vector fe, taking the fused feature fe as input, and classifying emotions through a full-connection neural network to obtain emotion tags. According to the method, emotion feature extraction is carried out on two different modes of data of voice and video, then the obtained features are spliced, and finally emotio