Speech synthesis method and device, and medium

The invention provides a speech synthesis method and device and a medium, and relates to the field of artificial intelligence, and the speech synthesis method comprises the steps: obtaining to-be-synthesized phoneme information; processing the phoneme information by using a non-autoregressive acoust...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	ZHONG RONGXIU, YANG HUIBAO, LIU YING, ZHANG SHILEI
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	ZHONG RONGXIU YANG HUIBAO LIU YING ZHANG SHILEI
description	The invention provides a speech synthesis method and device and a medium, and relates to the field of artificial intelligence, and the speech synthesis method comprises the steps: obtaining to-be-synthesized phoneme information; processing the phoneme information by using a non-autoregressive acoustic model to obtain first Mel spectrum information corresponding to the phoneme information; and synthesizing a target voice according to the first Mel spectrum information. In the speech synthesis process, the non-autoregressive acoustic model is specifically adopted to process phoneme information and obtain the corresponding Mel spectrum, the parallel capability of a processor can be fully utilized, then the synthesis speed can be increased, error accumulation and error transmission are reduced, and the speech synthesis robustness is improved while the speech synthesis speed is increased. 本发明提供一种语音合成方法、设备及介质，涉及人工智能领域，其中，所述语音合成方法包括：获取待合成的音素信息；利用非自回归声学模型处理所述音素信息，获取所述音素信息对应的第一梅尔频谱信息；根据所述第一梅尔频谱信息，合成目标语音。在语音合成过程中，具体采用非
format	Patent
fullrecord	<record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_CN116913244A</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>CN116913244A</sourcerecordid><originalsourceid>FETCH-epo_espacenet_CN116913244A3</originalsourceid><addsrcrecordid>eNrjZNALLkhNTc5QKK7MK8lILc4sVshNLcnIT1FIzEtRSEkty0xO1QGzc1NTMktzeRhY0xJzilN5oTQ3g6Kba4izh25qQX58anFBYnJqXmpJvLOfoaGZpaGxkYmJozExagDRMSoZ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Speech synthesis method and device, and medium</title><source>esp@cenet</source><creator>ZHONG RONGXIU ; YANG HUIBAO ; LIU YING ; ZHANG SHILEI</creator><creatorcontrib>ZHONG RONGXIU ; YANG HUIBAO ; LIU YING ; ZHANG SHILEI</creatorcontrib><description>The invention provides a speech synthesis method and device and a medium, and relates to the field of artificial intelligence, and the speech synthesis method comprises the steps: obtaining to-be-synthesized phoneme information; processing the phoneme information by using a non-autoregressive acoustic model to obtain first Mel spectrum information corresponding to the phoneme information; and synthesizing a target voice according to the first Mel spectrum information. In the speech synthesis process, the non-autoregressive acoustic model is specifically adopted to process phoneme information and obtain the corresponding Mel spectrum, the parallel capability of a processor can be fully utilized, then the synthesis speed can be increased, error accumulation and error transmission are reduced, and the speech synthesis robustness is improved while the speech synthesis speed is increased. 本发明提供一种语音合成方法、设备及介质，涉及人工智能领域，其中，所述语音合成方法包括：获取待合成的音素信息；利用非自回归声学模型处理所述音素信息，获取所述音素信息对应的第一梅尔频谱信息；根据所述第一梅尔频谱信息，合成目标语音。在语音合成过程中，具体采用非</description><language>chi ; eng</language><subject>ACOUSTICS ; MUSICAL INSTRUMENTS ; PHYSICS ; SPEECH ANALYSIS OR SYNTHESIS ; SPEECH OR AUDIO CODING OR DECODING ; SPEECH OR VOICE PROCESSING ; SPEECH RECOGNITION</subject><creationdate>2023</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20231020&DB=EPODOC&CC=CN&NR=116913244A$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,776,881,25543,76293</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20231020&DB=EPODOC&CC=CN&NR=116913244A$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>ZHONG RONGXIU</creatorcontrib><creatorcontrib>YANG HUIBAO</creatorcontrib><creatorcontrib>LIU YING</creatorcontrib><creatorcontrib>ZHANG SHILEI</creatorcontrib><title>Speech synthesis method and device, and medium</title><description>The invention provides a speech synthesis method and device and a medium, and relates to the field of artificial intelligence, and the speech synthesis method comprises the steps: obtaining to-be-synthesized phoneme information; processing the phoneme information by using a non-autoregressive acoustic model to obtain first Mel spectrum information corresponding to the phoneme information; and synthesizing a target voice according to the first Mel spectrum information. In the speech synthesis process, the non-autoregressive acoustic model is specifically adopted to process phoneme information and obtain the corresponding Mel spectrum, the parallel capability of a processor can be fully utilized, then the synthesis speed can be increased, error accumulation and error transmission are reduced, and the speech synthesis robustness is improved while the speech synthesis speed is increased. 本发明提供一种语音合成方法、设备及介质，涉及人工智能领域，其中，所述语音合成方法包括：获取待合成的音素信息；利用非自回归声学模型处理所述音素信息，获取所述音素信息对应的第一梅尔频谱信息；根据所述第一梅尔频谱信息，合成目标语音。在语音合成过程中，具体采用非</description><subject>ACOUSTICS</subject><subject>MUSICAL INSTRUMENTS</subject><subject>PHYSICS</subject><subject>SPEECH ANALYSIS OR SYNTHESIS</subject><subject>SPEECH OR AUDIO CODING OR DECODING</subject><subject>SPEECH OR VOICE PROCESSING</subject><subject>SPEECH RECOGNITION</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2023</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZNALLkhNTc5QKK7MK8lILc4sVshNLcnIT1FIzEtRSEkty0xO1QGzc1NTMktzeRhY0xJzilN5oTQ3g6Kba4izh25qQX58anFBYnJqXmpJvLOfoaGZpaGxkYmJozExagDRMSoZ</recordid><startdate>20231020</startdate><enddate>20231020</enddate><creator>ZHONG RONGXIU</creator><creator>YANG HUIBAO</creator><creator>LIU YING</creator><creator>ZHANG SHILEI</creator><scope>EVB</scope></search><sort><creationdate>20231020</creationdate><title>Speech synthesis method and device, and medium</title><author>ZHONG RONGXIU ; YANG HUIBAO ; LIU YING ; ZHANG SHILEI</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_CN116913244A3</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>chi ; eng</language><creationdate>2023</creationdate><topic>ACOUSTICS</topic><topic>MUSICAL INSTRUMENTS</topic><topic>PHYSICS</topic><topic>SPEECH ANALYSIS OR SYNTHESIS</topic><topic>SPEECH OR AUDIO CODING OR DECODING</topic><topic>SPEECH OR VOICE PROCESSING</topic><topic>SPEECH RECOGNITION</topic><toplevel>online_resources</toplevel><creatorcontrib>ZHONG RONGXIU</creatorcontrib><creatorcontrib>YANG HUIBAO</creatorcontrib><creatorcontrib>LIU YING</creatorcontrib><creatorcontrib>ZHANG SHILEI</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>ZHONG RONGXIU</au><au>YANG HUIBAO</au><au>LIU YING</au><au>ZHANG SHILEI</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Speech synthesis method and device, and medium</title><date>2023-10-20</date><risdate>2023</risdate><abstract>The invention provides a speech synthesis method and device and a medium, and relates to the field of artificial intelligence, and the speech synthesis method comprises the steps: obtaining to-be-synthesized phoneme information; processing the phoneme information by using a non-autoregressive acoustic model to obtain first Mel spectrum information corresponding to the phoneme information; and synthesizing a target voice according to the first Mel spectrum information. In the speech synthesis process, the non-autoregressive acoustic model is specifically adopted to process phoneme information and obtain the corresponding Mel spectrum, the parallel capability of a processor can be fully utilized, then the synthesis speed can be increased, error accumulation and error transmission are reduced, and the speech synthesis robustness is improved while the speech synthesis speed is increased. 本发明提供一种语音合成方法、设备及介质，涉及人工智能领域，其中，所述语音合成方法包括：获取待合成的音素信息；利用非自回归声学模型处理所述音素信息，获取所述音素信息对应的第一梅尔频谱信息；根据所述第一梅尔频谱信息，合成目标语音。在语音合成过程中，具体采用非</abstract><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier
ispartof
issn
language	chi ; eng
recordid	cdi_epo_espacenet_CN116913244A
source	esp@cenet
subjects	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
title	Speech synthesis method and device, and medium
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T14%3A48%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=ZHONG%20RONGXIU&rft.date=2023-10-20&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3ECN116913244A%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true