ELECTRONIC DEVICE AND SPEECH RECOGNITION METHOD THEREFOR, AND MEDIUM

Embodiments of this application provide an electronic device, a speech recognition method therefor, and a medium, and relate to a speech recognition technology in the field of artificial intelligence (Artificial Intelligence, AI). The speech recognition method in this application includes: obtaining...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	LU, Yuewan, QIN, Lei, LIU, Hao, ZHANG, Lele
Format:	Patent
Sprache:	eng ; fre ; ger
Schlagworte:	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	LU, Yuewan QIN, Lei LIU, Hao ZHANG, Lele
description	Embodiments of this application provide an electronic device, a speech recognition method therefor, and a medium, and relate to a speech recognition technology in the field of artificial intelligence (Artificial Intelligence, AI). The speech recognition method in this application includes: obtaining a facial depth image and a to-be-recognized voice of a user, where the facial depth image is an image collected by using a depth camera; recognizing a mouth shape feature from the facial depth image, and recognizing a voice feature from a to-be-recognized audio; and fusing the voice feature and the mouth shape feature into an audio-video feature, and recognizing, based on the audio-video feature, a voice uttered by the user. According to the method, because the mouth shape feature extracted from the facial depth image is not affected by light of an environment, the mouth shape feature can more accurately reflect a mouth shape change obtained when the user utters the voice. The mouth shape feature extracted from the facial depth image and the voice feature are fused, so that speech recognition accuracy can be improved.
format	Patent
fullrecord	<record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_EP4191579A4</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>EP4191579A4</sourcerecordid><originalsourceid>FETCH-epo_espacenet_EP4191579A43</originalsourceid><addsrcrecordid>eNrjZHBx9XF1Dgny9_N0VnBxDfN0dlVw9HNRCA5wdXX2UAhydfZ39_MM8fT3U_B1DfHwd1EI8XANcnXzD9IBq_N1dfEM9eVhYE1LzClO5YXS3AwKbq4hzh66qQX58anFBYnJqXmpJfGuASaGloam5paOJsZEKAEA4tIrJQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>ELECTRONIC DEVICE AND SPEECH RECOGNITION METHOD THEREFOR, AND MEDIUM</title><source>esp@cenet</source><creator>LU, Yuewan ; QIN, Lei ; LIU, Hao ; ZHANG, Lele</creator><creatorcontrib>LU, Yuewan ; QIN, Lei ; LIU, Hao ; ZHANG, Lele</creatorcontrib><description>Embodiments of this application provide an electronic device, a speech recognition method therefor, and a medium, and relate to a speech recognition technology in the field of artificial intelligence (Artificial Intelligence, AI). The speech recognition method in this application includes: obtaining a facial depth image and a to-be-recognized voice of a user, where the facial depth image is an image collected by using a depth camera; recognizing a mouth shape feature from the facial depth image, and recognizing a voice feature from a to-be-recognized audio; and fusing the voice feature and the mouth shape feature into an audio-video feature, and recognizing, based on the audio-video feature, a voice uttered by the user. According to the method, because the mouth shape feature extracted from the facial depth image is not affected by light of an environment, the mouth shape feature can more accurately reflect a mouth shape change obtained when the user utters the voice. The mouth shape feature extracted from the facial depth image and the voice feature are fused, so that speech recognition accuracy can be improved.</description><language>eng ; fre ; ger</language><subject>ACOUSTICS ; MUSICAL INSTRUMENTS ; PHYSICS ; SPEECH ANALYSIS OR SYNTHESIS ; SPEECH OR AUDIO CODING OR DECODING ; SPEECH OR VOICE PROCESSING ; SPEECH RECOGNITION</subject><creationdate>2024</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20240508&DB=EPODOC&CC=EP&NR=4191579A4$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,776,881,25542,76289</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20240508&DB=EPODOC&CC=EP&NR=4191579A4$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>LU, Yuewan</creatorcontrib><creatorcontrib>QIN, Lei</creatorcontrib><creatorcontrib>LIU, Hao</creatorcontrib><creatorcontrib>ZHANG, Lele</creatorcontrib><title>ELECTRONIC DEVICE AND SPEECH RECOGNITION METHOD THEREFOR, AND MEDIUM</title><description>Embodiments of this application provide an electronic device, a speech recognition method therefor, and a medium, and relate to a speech recognition technology in the field of artificial intelligence (Artificial Intelligence, AI). The speech recognition method in this application includes: obtaining a facial depth image and a to-be-recognized voice of a user, where the facial depth image is an image collected by using a depth camera; recognizing a mouth shape feature from the facial depth image, and recognizing a voice feature from a to-be-recognized audio; and fusing the voice feature and the mouth shape feature into an audio-video feature, and recognizing, based on the audio-video feature, a voice uttered by the user. According to the method, because the mouth shape feature extracted from the facial depth image is not affected by light of an environment, the mouth shape feature can more accurately reflect a mouth shape change obtained when the user utters the voice. The mouth shape feature extracted from the facial depth image and the voice feature are fused, so that speech recognition accuracy can be improved.</description><subject>ACOUSTICS</subject><subject>MUSICAL INSTRUMENTS</subject><subject>PHYSICS</subject><subject>SPEECH ANALYSIS OR SYNTHESIS</subject><subject>SPEECH OR AUDIO CODING OR DECODING</subject><subject>SPEECH OR VOICE PROCESSING</subject><subject>SPEECH RECOGNITION</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2024</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZHBx9XF1Dgny9_N0VnBxDfN0dlVw9HNRCA5wdXX2UAhydfZ39_MM8fT3U_B1DfHwd1EI8XANcnXzD9IBq_N1dfEM9eVhYE1LzClO5YXS3AwKbq4hzh66qQX58anFBYnJqXmpJfGuASaGloam5paOJsZEKAEA4tIrJQ</recordid><startdate>20240508</startdate><enddate>20240508</enddate><creator>LU, Yuewan</creator><creator>QIN, Lei</creator><creator>LIU, Hao</creator><creator>ZHANG, Lele</creator><scope>EVB</scope></search><sort><creationdate>20240508</creationdate><title>ELECTRONIC DEVICE AND SPEECH RECOGNITION METHOD THEREFOR, AND MEDIUM</title><author>LU, Yuewan ; QIN, Lei ; LIU, Hao ; ZHANG, Lele</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_EP4191579A43</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng ; fre ; ger</language><creationdate>2024</creationdate><topic>ACOUSTICS</topic><topic>MUSICAL INSTRUMENTS</topic><topic>PHYSICS</topic><topic>SPEECH ANALYSIS OR SYNTHESIS</topic><topic>SPEECH OR AUDIO CODING OR DECODING</topic><topic>SPEECH OR VOICE PROCESSING</topic><topic>SPEECH RECOGNITION</topic><toplevel>online_resources</toplevel><creatorcontrib>LU, Yuewan</creatorcontrib><creatorcontrib>QIN, Lei</creatorcontrib><creatorcontrib>LIU, Hao</creatorcontrib><creatorcontrib>ZHANG, Lele</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>LU, Yuewan</au><au>QIN, Lei</au><au>LIU, Hao</au><au>ZHANG, Lele</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>ELECTRONIC DEVICE AND SPEECH RECOGNITION METHOD THEREFOR, AND MEDIUM</title><date>2024-05-08</date><risdate>2024</risdate><abstract>Embodiments of this application provide an electronic device, a speech recognition method therefor, and a medium, and relate to a speech recognition technology in the field of artificial intelligence (Artificial Intelligence, AI). The speech recognition method in this application includes: obtaining a facial depth image and a to-be-recognized voice of a user, where the facial depth image is an image collected by using a depth camera; recognizing a mouth shape feature from the facial depth image, and recognizing a voice feature from a to-be-recognized audio; and fusing the voice feature and the mouth shape feature into an audio-video feature, and recognizing, based on the audio-video feature, a voice uttered by the user. According to the method, because the mouth shape feature extracted from the facial depth image is not affected by light of an environment, the mouth shape feature can more accurately reflect a mouth shape change obtained when the user utters the voice. The mouth shape feature extracted from the facial depth image and the voice feature are fused, so that speech recognition accuracy can be improved.</abstract><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier
ispartof
issn
language	eng ; fre ; ger
recordid	cdi_epo_espacenet_EP4191579A4
source	esp@cenet
subjects	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
title	ELECTRONIC DEVICE AND SPEECH RECOGNITION METHOD THEREFOR, AND MEDIUM
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T11%3A02%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=LU,%20Yuewan&rft.date=2024-05-08&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EEP4191579A4%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true