ELECTRONIC DEVICE AND SPEECH RECOGNITION METHOD THEREFOR, AND MEDIUM

Embodiments of this application provide an electronic device, a speech recognition method therefor, and a medium, and relate to a speech recognition technology in the field of artificial intelligence (Artificial Intelligence, AI). The speech recognition method in this application includes: obtaining...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: LU, Yuewan, QIN, Lei, LIU, Hao, ZHANG, Lele
Format: Patent
Sprache:eng ; fre ; ger
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator LU, Yuewan
QIN, Lei
LIU, Hao
ZHANG, Lele
description Embodiments of this application provide an electronic device, a speech recognition method therefor, and a medium, and relate to a speech recognition technology in the field of artificial intelligence (Artificial Intelligence, AI). The speech recognition method in this application includes: obtaining a facial depth image and a to-be-recognized voice of a user, where the facial depth image is an image collected by using a depth camera; recognizing a mouth shape feature from the facial depth image, and recognizing a voice feature from a to-be-recognized audio; and fusing the voice feature and the mouth shape feature into an audio-video feature, and recognizing, based on the audio-video feature, a voice uttered by the user. According to the method, because the mouth shape feature extracted from the facial depth image is not affected by light of an environment, the mouth shape feature can more accurately reflect a mouth shape change obtained when the user utters the voice. The mouth shape feature extracted from the facial depth image and the voice feature are fused, so that speech recognition accuracy can be improved.
format Patent
fullrecord <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_EP4191579A4</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>EP4191579A4</sourcerecordid><originalsourceid>FETCH-epo_espacenet_EP4191579A43</originalsourceid><addsrcrecordid>eNrjZHBx9XF1Dgny9_N0VnBxDfN0dlVw9HNRCA5wdXX2UAhydfZ39_MM8fT3U_B1DfHwd1EI8XANcnXzD9IBq_N1dfEM9eVhYE1LzClO5YXS3AwKbq4hzh66qQX58anFBYnJqXmpJfGuASaGloam5paOJsZEKAEA4tIrJQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>ELECTRONIC DEVICE AND SPEECH RECOGNITION METHOD THEREFOR, AND MEDIUM</title><source>esp@cenet</source><creator>LU, Yuewan ; QIN, Lei ; LIU, Hao ; ZHANG, Lele</creator><creatorcontrib>LU, Yuewan ; QIN, Lei ; LIU, Hao ; ZHANG, Lele</creatorcontrib><description>Embodiments of this application provide an electronic device, a speech recognition method therefor, and a medium, and relate to a speech recognition technology in the field of artificial intelligence (Artificial Intelligence, AI). The speech recognition method in this application includes: obtaining a facial depth image and a to-be-recognized voice of a user, where the facial depth image is an image collected by using a depth camera; recognizing a mouth shape feature from the facial depth image, and recognizing a voice feature from a to-be-recognized audio; and fusing the voice feature and the mouth shape feature into an audio-video feature, and recognizing, based on the audio-video feature, a voice uttered by the user. According to the method, because the mouth shape feature extracted from the facial depth image is not affected by light of an environment, the mouth shape feature can more accurately reflect a mouth shape change obtained when the user utters the voice. The mouth shape feature extracted from the facial depth image and the voice feature are fused, so that speech recognition accuracy can be improved.</description><language>eng ; fre ; ger</language><subject>ACOUSTICS ; MUSICAL INSTRUMENTS ; PHYSICS ; SPEECH ANALYSIS OR SYNTHESIS ; SPEECH OR AUDIO CODING OR DECODING ; SPEECH OR VOICE PROCESSING ; SPEECH RECOGNITION</subject><creationdate>2024</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20240508&amp;DB=EPODOC&amp;CC=EP&amp;NR=4191579A4$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,776,881,25542,76289</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20240508&amp;DB=EPODOC&amp;CC=EP&amp;NR=4191579A4$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>LU, Yuewan</creatorcontrib><creatorcontrib>QIN, Lei</creatorcontrib><creatorcontrib>LIU, Hao</creatorcontrib><creatorcontrib>ZHANG, Lele</creatorcontrib><title>ELECTRONIC DEVICE AND SPEECH RECOGNITION METHOD THEREFOR, AND MEDIUM</title><description>Embodiments of this application provide an electronic device, a speech recognition method therefor, and a medium, and relate to a speech recognition technology in the field of artificial intelligence (Artificial Intelligence, AI). The speech recognition method in this application includes: obtaining a facial depth image and a to-be-recognized voice of a user, where the facial depth image is an image collected by using a depth camera; recognizing a mouth shape feature from the facial depth image, and recognizing a voice feature from a to-be-recognized audio; and fusing the voice feature and the mouth shape feature into an audio-video feature, and recognizing, based on the audio-video feature, a voice uttered by the user. According to the method, because the mouth shape feature extracted from the facial depth image is not affected by light of an environment, the mouth shape feature can more accurately reflect a mouth shape change obtained when the user utters the voice. The mouth shape feature extracted from the facial depth image and the voice feature are fused, so that speech recognition accuracy can be improved.</description><subject>ACOUSTICS</subject><subject>MUSICAL INSTRUMENTS</subject><subject>PHYSICS</subject><subject>SPEECH ANALYSIS OR SYNTHESIS</subject><subject>SPEECH OR AUDIO CODING OR DECODING</subject><subject>SPEECH OR VOICE PROCESSING</subject><subject>SPEECH RECOGNITION</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2024</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZHBx9XF1Dgny9_N0VnBxDfN0dlVw9HNRCA5wdXX2UAhydfZ39_MM8fT3U_B1DfHwd1EI8XANcnXzD9IBq_N1dfEM9eVhYE1LzClO5YXS3AwKbq4hzh66qQX58anFBYnJqXmpJfGuASaGloam5paOJsZEKAEA4tIrJQ</recordid><startdate>20240508</startdate><enddate>20240508</enddate><creator>LU, Yuewan</creator><creator>QIN, Lei</creator><creator>LIU, Hao</creator><creator>ZHANG, Lele</creator><scope>EVB</scope></search><sort><creationdate>20240508</creationdate><title>ELECTRONIC DEVICE AND SPEECH RECOGNITION METHOD THEREFOR, AND MEDIUM</title><author>LU, Yuewan ; QIN, Lei ; LIU, Hao ; ZHANG, Lele</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_EP4191579A43</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng ; fre ; ger</language><creationdate>2024</creationdate><topic>ACOUSTICS</topic><topic>MUSICAL INSTRUMENTS</topic><topic>PHYSICS</topic><topic>SPEECH ANALYSIS OR SYNTHESIS</topic><topic>SPEECH OR AUDIO CODING OR DECODING</topic><topic>SPEECH OR VOICE PROCESSING</topic><topic>SPEECH RECOGNITION</topic><toplevel>online_resources</toplevel><creatorcontrib>LU, Yuewan</creatorcontrib><creatorcontrib>QIN, Lei</creatorcontrib><creatorcontrib>LIU, Hao</creatorcontrib><creatorcontrib>ZHANG, Lele</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>LU, Yuewan</au><au>QIN, Lei</au><au>LIU, Hao</au><au>ZHANG, Lele</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>ELECTRONIC DEVICE AND SPEECH RECOGNITION METHOD THEREFOR, AND MEDIUM</title><date>2024-05-08</date><risdate>2024</risdate><abstract>Embodiments of this application provide an electronic device, a speech recognition method therefor, and a medium, and relate to a speech recognition technology in the field of artificial intelligence (Artificial Intelligence, AI). The speech recognition method in this application includes: obtaining a facial depth image and a to-be-recognized voice of a user, where the facial depth image is an image collected by using a depth camera; recognizing a mouth shape feature from the facial depth image, and recognizing a voice feature from a to-be-recognized audio; and fusing the voice feature and the mouth shape feature into an audio-video feature, and recognizing, based on the audio-video feature, a voice uttered by the user. According to the method, because the mouth shape feature extracted from the facial depth image is not affected by light of an environment, the mouth shape feature can more accurately reflect a mouth shape change obtained when the user utters the voice. The mouth shape feature extracted from the facial depth image and the voice feature are fused, so that speech recognition accuracy can be improved.</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language eng ; fre ; ger
recordid cdi_epo_espacenet_EP4191579A4
source esp@cenet
subjects ACOUSTICS
MUSICAL INSTRUMENTS
PHYSICS
SPEECH ANALYSIS OR SYNTHESIS
SPEECH OR AUDIO CODING OR DECODING
SPEECH OR VOICE PROCESSING
SPEECH RECOGNITION
title ELECTRONIC DEVICE AND SPEECH RECOGNITION METHOD THEREFOR, AND MEDIUM
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T11%3A02%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=LU,%20Yuewan&rft.date=2024-05-08&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EEP4191579A4%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true