Mongolian speech emotion recognition method based on Whisper pre-training model

The Mongolian speech emotion recognition method based on the Whisper pre-training model comprises the steps that Mongolian emotion speech audio data are acquired, and each Mongolian audio corresponds to one Mongolian text; extracting a logarithmic Mel spectrogram and rhythmic features from the emoti...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: YUAN SHUAI, REN-QING DAOERJI, OUNIER, JI YATU, LI LEIXIAO, SHI BAO
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator YUAN SHUAI
REN-QING DAOERJI
OUNIER
JI YATU
LI LEIXIAO
SHI BAO
description The Mongolian speech emotion recognition method based on the Whisper pre-training model comprises the steps that Mongolian emotion speech audio data are acquired, and each Mongolian audio corresponds to one Mongolian text; extracting a logarithmic Mel spectrogram and rhythmic features from the emotional speech; the Whisper pre-training model is input into the Whisper pre-training model, then intermediate features, obtained from a Whisper model encoder part, of encoders of all layers are processed, and the multi-head attention module is adapted to the input dimension of the multi-head attention module through two continuous non-linear full-connection layers; the processed spectrum features and rhythm features are input into a multi-head attention module, key value pairs in an attention mechanism are calculated through the spectrum features, and query vectors are calculated through the rhythm features; after the output of the attention module is obtained, the mean value and the variance of the output are calcul
format Patent
fullrecord <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_CN118506809A</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>CN118506809A</sourcerecordid><originalsourceid>FETCH-epo_espacenet_CN118506809A3</originalsourceid><addsrcrecordid>eNqNisEKwjAQBXvxIOo_rB9QaBGlHqUoXtSL4LHE9jVZaHZDkv_HIn6Ap5mBWRaPm4rViY1QCkDvCF4zq1BEr1b46x7Z6UBvkzDQ3C_H8x0pRJQ5GhYWS14HTOtiMZopYfPjqthezs_2WiJohxRMD0Hu2ntdN_vq0FTH0-6f5wOBRjc6</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Mongolian speech emotion recognition method based on Whisper pre-training model</title><source>esp@cenet</source><creator>YUAN SHUAI ; REN-QING DAOERJI ; OUNIER ; JI YATU ; LI LEIXIAO ; SHI BAO</creator><creatorcontrib>YUAN SHUAI ; REN-QING DAOERJI ; OUNIER ; JI YATU ; LI LEIXIAO ; SHI BAO</creatorcontrib><description>The Mongolian speech emotion recognition method based on the Whisper pre-training model comprises the steps that Mongolian emotion speech audio data are acquired, and each Mongolian audio corresponds to one Mongolian text; extracting a logarithmic Mel spectrogram and rhythmic features from the emotional speech; the Whisper pre-training model is input into the Whisper pre-training model, then intermediate features, obtained from a Whisper model encoder part, of encoders of all layers are processed, and the multi-head attention module is adapted to the input dimension of the multi-head attention module through two continuous non-linear full-connection layers; the processed spectrum features and rhythm features are input into a multi-head attention module, key value pairs in an attention mechanism are calculated through the spectrum features, and query vectors are calculated through the rhythm features; after the output of the attention module is obtained, the mean value and the variance of the output are calcul</description><language>chi ; eng</language><subject>ACOUSTICS ; MUSICAL INSTRUMENTS ; PHYSICS ; SPEECH ANALYSIS OR SYNTHESIS ; SPEECH OR AUDIO CODING OR DECODING ; SPEECH OR VOICE PROCESSING ; SPEECH RECOGNITION</subject><creationdate>2024</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20240816&amp;DB=EPODOC&amp;CC=CN&amp;NR=118506809A$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,780,885,25564,76547</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20240816&amp;DB=EPODOC&amp;CC=CN&amp;NR=118506809A$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>YUAN SHUAI</creatorcontrib><creatorcontrib>REN-QING DAOERJI</creatorcontrib><creatorcontrib>OUNIER</creatorcontrib><creatorcontrib>JI YATU</creatorcontrib><creatorcontrib>LI LEIXIAO</creatorcontrib><creatorcontrib>SHI BAO</creatorcontrib><title>Mongolian speech emotion recognition method based on Whisper pre-training model</title><description>The Mongolian speech emotion recognition method based on the Whisper pre-training model comprises the steps that Mongolian emotion speech audio data are acquired, and each Mongolian audio corresponds to one Mongolian text; extracting a logarithmic Mel spectrogram and rhythmic features from the emotional speech; the Whisper pre-training model is input into the Whisper pre-training model, then intermediate features, obtained from a Whisper model encoder part, of encoders of all layers are processed, and the multi-head attention module is adapted to the input dimension of the multi-head attention module through two continuous non-linear full-connection layers; the processed spectrum features and rhythm features are input into a multi-head attention module, key value pairs in an attention mechanism are calculated through the spectrum features, and query vectors are calculated through the rhythm features; after the output of the attention module is obtained, the mean value and the variance of the output are calcul</description><subject>ACOUSTICS</subject><subject>MUSICAL INSTRUMENTS</subject><subject>PHYSICS</subject><subject>SPEECH ANALYSIS OR SYNTHESIS</subject><subject>SPEECH OR AUDIO CODING OR DECODING</subject><subject>SPEECH OR VOICE PROCESSING</subject><subject>SPEECH RECOGNITION</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2024</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNqNisEKwjAQBXvxIOo_rB9QaBGlHqUoXtSL4LHE9jVZaHZDkv_HIn6Ap5mBWRaPm4rViY1QCkDvCF4zq1BEr1b46x7Z6UBvkzDQ3C_H8x0pRJQ5GhYWS14HTOtiMZopYfPjqthezs_2WiJohxRMD0Hu2ntdN_vq0FTH0-6f5wOBRjc6</recordid><startdate>20240816</startdate><enddate>20240816</enddate><creator>YUAN SHUAI</creator><creator>REN-QING DAOERJI</creator><creator>OUNIER</creator><creator>JI YATU</creator><creator>LI LEIXIAO</creator><creator>SHI BAO</creator><scope>EVB</scope></search><sort><creationdate>20240816</creationdate><title>Mongolian speech emotion recognition method based on Whisper pre-training model</title><author>YUAN SHUAI ; REN-QING DAOERJI ; OUNIER ; JI YATU ; LI LEIXIAO ; SHI BAO</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_CN118506809A3</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>chi ; eng</language><creationdate>2024</creationdate><topic>ACOUSTICS</topic><topic>MUSICAL INSTRUMENTS</topic><topic>PHYSICS</topic><topic>SPEECH ANALYSIS OR SYNTHESIS</topic><topic>SPEECH OR AUDIO CODING OR DECODING</topic><topic>SPEECH OR VOICE PROCESSING</topic><topic>SPEECH RECOGNITION</topic><toplevel>online_resources</toplevel><creatorcontrib>YUAN SHUAI</creatorcontrib><creatorcontrib>REN-QING DAOERJI</creatorcontrib><creatorcontrib>OUNIER</creatorcontrib><creatorcontrib>JI YATU</creatorcontrib><creatorcontrib>LI LEIXIAO</creatorcontrib><creatorcontrib>SHI BAO</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>YUAN SHUAI</au><au>REN-QING DAOERJI</au><au>OUNIER</au><au>JI YATU</au><au>LI LEIXIAO</au><au>SHI BAO</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Mongolian speech emotion recognition method based on Whisper pre-training model</title><date>2024-08-16</date><risdate>2024</risdate><abstract>The Mongolian speech emotion recognition method based on the Whisper pre-training model comprises the steps that Mongolian emotion speech audio data are acquired, and each Mongolian audio corresponds to one Mongolian text; extracting a logarithmic Mel spectrogram and rhythmic features from the emotional speech; the Whisper pre-training model is input into the Whisper pre-training model, then intermediate features, obtained from a Whisper model encoder part, of encoders of all layers are processed, and the multi-head attention module is adapted to the input dimension of the multi-head attention module through two continuous non-linear full-connection layers; the processed spectrum features and rhythm features are input into a multi-head attention module, key value pairs in an attention mechanism are calculated through the spectrum features, and query vectors are calculated through the rhythm features; after the output of the attention module is obtained, the mean value and the variance of the output are calcul</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language chi ; eng
recordid cdi_epo_espacenet_CN118506809A
source esp@cenet
subjects ACOUSTICS
MUSICAL INSTRUMENTS
PHYSICS
SPEECH ANALYSIS OR SYNTHESIS
SPEECH OR AUDIO CODING OR DECODING
SPEECH OR VOICE PROCESSING
SPEECH RECOGNITION
title Mongolian speech emotion recognition method based on Whisper pre-training model
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T17%3A26%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=YUAN%20SHUAI&rft.date=2024-08-16&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3ECN118506809A%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true