Mongolian speech emotion recognition method based on Whisper pre-training model

The Mongolian speech emotion recognition method based on the Whisper pre-training model comprises the steps that Mongolian emotion speech audio data are acquired, and each Mongolian audio corresponds to one Mongolian text; extracting a logarithmic Mel spectrogram and rhythmic features from the emoti...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	YUAN SHUAI, REN-QING DAOERJI, OUNIER, JI YATU, LI LEIXIAO, SHI BAO
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	YUAN SHUAI REN-QING DAOERJI OUNIER JI YATU LI LEIXIAO SHI BAO
description	The Mongolian speech emotion recognition method based on the Whisper pre-training model comprises the steps that Mongolian emotion speech audio data are acquired, and each Mongolian audio corresponds to one Mongolian text; extracting a logarithmic Mel spectrogram and rhythmic features from the emotional speech; the Whisper pre-training model is input into the Whisper pre-training model, then intermediate features, obtained from a Whisper model encoder part, of encoders of all layers are processed, and the multi-head attention module is adapted to the input dimension of the multi-head attention module through two continuous non-linear full-connection layers; the processed spectrum features and rhythm features are input into a multi-head attention module, key value pairs in an attention mechanism are calculated through the spectrum features, and query vectors are calculated through the rhythm features; after the output of the attention module is obtained, the mean value and the variance of the output are calcul
format	Patent
fullrecord	<record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_CN118506809A</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>CN118506809A</sourcerecordid><originalsourceid>FETCH-epo_espacenet_CN118506809A3</originalsourceid><addsrcrecordid>eNqNisEKwjAQBXvxIOo_rB9QaBGlHqUoXtSL4LHE9jVZaHZDkv_HIn6Ap5mBWRaPm4rViY1QCkDvCF4zq1BEr1b46x7Z6UBvkzDQ3C_H8x0pRJQ5GhYWS14HTOtiMZopYfPjqthezs_2WiJohxRMD0Hu2ntdN_vq0FTH0-6f5wOBRjc6</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Mongolian speech emotion recognition method based on Whisper pre-training model</title><source>esp@cenet</source><creator>YUAN SHUAI ; REN-QING DAOERJI ; OUNIER ; JI YATU ; LI LEIXIAO ; SHI BAO</creator><creatorcontrib>YUAN SHUAI ; REN-QING DAOERJI ; OUNIER ; JI YATU ; LI LEIXIAO ; SHI BAO</creatorcontrib><description>The Mongolian speech emotion recognition method based on the Whisper pre-training model comprises the steps that Mongolian emotion speech audio data are acquired, and each Mongolian audio corresponds to one Mongolian text; extracting a logarithmic Mel spectrogram and rhythmic features from the emotional speech; the Whisper pre-training model is input into the Whisper pre-training model, then intermediate features, obtained from a Whisper model encoder part, of encoders of all layers are processed, and the multi-head attention module is adapted to the input dimension of the multi-head attention module through two continuous non-linear full-connection layers; the processed spectrum features and rhythm features are input into a multi-head attention module, key value pairs in an attention mechanism are calculated through the spectrum features, and query vectors are calculated through the rhythm features; after the output of the attention module is obtained, the mean value and the variance of the output are calcul</description><language>chi ; eng</language><subject>ACOUSTICS ; MUSICAL INSTRUMENTS ; PHYSICS ; SPEECH ANALYSIS OR SYNTHESIS ; SPEECH OR AUDIO CODING OR DECODING ; SPEECH OR VOICE PROCESSING ; SPEECH RECOGNITION</subject><creationdate>2024</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20240816&DB=EPODOC&CC=CN&NR=118506809A$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,780,885,25564,76547</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20240816&DB=EPODOC&CC=CN&NR=118506809A$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>YUAN SHUAI</creatorcontrib><creatorcontrib>REN-QING DAOERJI</creatorcontrib><creatorcontrib>OUNIER</creatorcontrib><creatorcontrib>JI YATU</creatorcontrib><creatorcontrib>LI LEIXIAO</creatorcontrib><creatorcontrib>SHI BAO</creatorcontrib><title>Mongolian speech emotion recognition method based on Whisper pre-training model</title><description>The Mongolian speech emotion recognition method based on the Whisper pre-training model comprises the steps that Mongolian emotion speech audio data are acquired, and each Mongolian audio corresponds to one Mongolian text; extracting a logarithmic Mel spectrogram and rhythmic features from the emotional speech; the Whisper pre-training model is input into the Whisper pre-training model, then intermediate features, obtained from a Whisper model encoder part, of encoders of all layers are processed, and the multi-head attention module is adapted to the input dimension of the multi-head attention module through two continuous non-linear full-connection layers; the processed spectrum features and rhythm features are input into a multi-head attention module, key value pairs in an attention mechanism are calculated through the spectrum features, and query vectors are calculated through the rhythm features; after the output of the attention module is obtained, the mean value and the variance of the output are calcul</description><subject>ACOUSTICS</subject><subject>MUSICAL INSTRUMENTS</subject><subject>PHYSICS</subject><subject>SPEECH ANALYSIS OR SYNTHESIS</subject><subject>SPEECH OR AUDIO CODING OR DECODING</subject><subject>SPEECH OR VOICE PROCESSING</subject><subject>SPEECH RECOGNITION</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2024</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNqNisEKwjAQBXvxIOo_rB9QaBGlHqUoXtSL4LHE9jVZaHZDkv_HIn6Ap5mBWRaPm4rViY1QCkDvCF4zq1BEr1b46x7Z6UBvkzDQ3C_H8x0pRJQ5GhYWS14HTOtiMZopYfPjqthezs_2WiJohxRMD0Hu2ntdN_vq0FTH0-6f5wOBRjc6</recordid><startdate>20240816</startdate><enddate>20240816</enddate><creator>YUAN SHUAI</creator><creator>REN-QING DAOERJI</creator><creator>OUNIER</creator><creator>JI YATU</creator><creator>LI LEIXIAO</creator><creator>SHI BAO</creator><scope>EVB</scope></search><sort><creationdate>20240816</creationdate><title>Mongolian speech emotion recognition method based on Whisper pre-training model</title><author>YUAN SHUAI ; REN-QING DAOERJI ; OUNIER ; JI YATU ; LI LEIXIAO ; SHI BAO</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_CN118506809A3</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>chi ; eng</language><creationdate>2024</creationdate><topic>ACOUSTICS</topic><topic>MUSICAL INSTRUMENTS</topic><topic>PHYSICS</topic><topic>SPEECH ANALYSIS OR SYNTHESIS</topic><topic>SPEECH OR AUDIO CODING OR DECODING</topic><topic>SPEECH OR VOICE PROCESSING</topic><topic>SPEECH RECOGNITION</topic><toplevel>online_resources</toplevel><creatorcontrib>YUAN SHUAI</creatorcontrib><creatorcontrib>REN-QING DAOERJI</creatorcontrib><creatorcontrib>OUNIER</creatorcontrib><creatorcontrib>JI YATU</creatorcontrib><creatorcontrib>LI LEIXIAO</creatorcontrib><creatorcontrib>SHI BAO</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>YUAN SHUAI</au><au>REN-QING DAOERJI</au><au>OUNIER</au><au>JI YATU</au><au>LI LEIXIAO</au><au>SHI BAO</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Mongolian speech emotion recognition method based on Whisper pre-training model</title><date>2024-08-16</date><risdate>2024</risdate><abstract>The Mongolian speech emotion recognition method based on the Whisper pre-training model comprises the steps that Mongolian emotion speech audio data are acquired, and each Mongolian audio corresponds to one Mongolian text; extracting a logarithmic Mel spectrogram and rhythmic features from the emotional speech; the Whisper pre-training model is input into the Whisper pre-training model, then intermediate features, obtained from a Whisper model encoder part, of encoders of all layers are processed, and the multi-head attention module is adapted to the input dimension of the multi-head attention module through two continuous non-linear full-connection layers; the processed spectrum features and rhythm features are input into a multi-head attention module, key value pairs in an attention mechanism are calculated through the spectrum features, and query vectors are calculated through the rhythm features; after the output of the attention module is obtained, the mean value and the variance of the output are calcul</abstract><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier
ispartof
issn
language	chi ; eng
recordid	cdi_epo_espacenet_CN118506809A
source	esp@cenet
subjects	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
title	Mongolian speech emotion recognition method based on Whisper pre-training model
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T17%3A26%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=YUAN%20SHUAI&rft.date=2024-08-16&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3ECN118506809A%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true