Method and device for aligning audio and text, electronic equipment and storage medium

The embodiment of the invention provides a method and device for aligning audio and text, electronic equipment and a storage medium. The method comprises the following steps: acquiring a target text and a corresponding target audio; determining a first phoneme corresponding to the word in the target...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	CHEN CHUANYI, ZHANG CHAOGANG, XUAN XIAOGUANG
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	CHEN CHUANYI ZHANG CHAOGANG XUAN XIAOGUANG
description	The embodiment of the invention provides a method and device for aligning audio and text, electronic equipment and a storage medium. The method comprises the following steps: acquiring a target text and a corresponding target audio; determining a first phoneme corresponding to the word in the target text according to a preset corresponding relationship between the word and the phoneme; according to a first phoneme sequence between the first phonemes, adding a preset phoneme behind each to-be-processed phoneme to obtain a second phoneme; obtaining a target probability of each target audio frame corresponding to each second phoneme based on the spectrum feature of each target audio frame in the target audio and a pre-trained probability prediction model; based on the target probability and a second phoneme sequence between the second phonemes, determining a target phoneme corresponding to each target audio frame from the second phonemes; and according to the embodiment of the invention, determining the text to
format	Patent
fullrecord	<record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_CN113536029A</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>CN113536029A</sourcerecordid><originalsourceid>FETCH-epo_espacenet_CN113536029A3</originalsourceid><addsrcrecordid>eNqNyr0KwjAUhuEuDqLew3FXsBYFRymKi07iWkLytR7In8mJePlC8QKc3uF5p9XjCnkGQ8obMnizBvUhkbI8ePYDqWI4jCr4yIpgoSUFz5rwKhwdvIycJSQ1gBwMFzevJr2yGYtfZ9XyfLq3lzVi6JCj0vCQrr3VdbNr9pvt4dj883wB2x85LA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Method and device for aligning audio and text, electronic equipment and storage medium</title><source>esp@cenet</source><creator>CHEN CHUANYI ; ZHANG CHAOGANG ; XUAN XIAOGUANG</creator><creatorcontrib>CHEN CHUANYI ; ZHANG CHAOGANG ; XUAN XIAOGUANG</creatorcontrib><description>The embodiment of the invention provides a method and device for aligning audio and text, electronic equipment and a storage medium. The method comprises the following steps: acquiring a target text and a corresponding target audio; determining a first phoneme corresponding to the word in the target text according to a preset corresponding relationship between the word and the phoneme; according to a first phoneme sequence between the first phonemes, adding a preset phoneme behind each to-be-processed phoneme to obtain a second phoneme; obtaining a target probability of each target audio frame corresponding to each second phoneme based on the spectrum feature of each target audio frame in the target audio and a pre-trained probability prediction model; based on the target probability and a second phoneme sequence between the second phonemes, determining a target phoneme corresponding to each target audio frame from the second phonemes; and according to the embodiment of the invention, determining the text to</description><language>chi ; eng</language><subject>CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; COUNTING ; ELECTRIC DIGITAL DATA PROCESSING ; PHYSICS</subject><creationdate>2021</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20211022&DB=EPODOC&CC=CN&NR=113536029A$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,776,881,25544,76293</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20211022&DB=EPODOC&CC=CN&NR=113536029A$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>CHEN CHUANYI</creatorcontrib><creatorcontrib>ZHANG CHAOGANG</creatorcontrib><creatorcontrib>XUAN XIAOGUANG</creatorcontrib><title>Method and device for aligning audio and text, electronic equipment and storage medium</title><description>The embodiment of the invention provides a method and device for aligning audio and text, electronic equipment and a storage medium. The method comprises the following steps: acquiring a target text and a corresponding target audio; determining a first phoneme corresponding to the word in the target text according to a preset corresponding relationship between the word and the phoneme; according to a first phoneme sequence between the first phonemes, adding a preset phoneme behind each to-be-processed phoneme to obtain a second phoneme; obtaining a target probability of each target audio frame corresponding to each second phoneme based on the spectrum feature of each target audio frame in the target audio and a pre-trained probability prediction model; based on the target probability and a second phoneme sequence between the second phonemes, determining a target phoneme corresponding to each target audio frame from the second phonemes; and according to the embodiment of the invention, determining the text to</description><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>ELECTRIC DIGITAL DATA PROCESSING</subject><subject>PHYSICS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2021</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNqNyr0KwjAUhuEuDqLew3FXsBYFRymKi07iWkLytR7In8mJePlC8QKc3uF5p9XjCnkGQ8obMnizBvUhkbI8ePYDqWI4jCr4yIpgoSUFz5rwKhwdvIycJSQ1gBwMFzevJr2yGYtfZ9XyfLq3lzVi6JCj0vCQrr3VdbNr9pvt4dj883wB2x85LA</recordid><startdate>20211022</startdate><enddate>20211022</enddate><creator>CHEN CHUANYI</creator><creator>ZHANG CHAOGANG</creator><creator>XUAN XIAOGUANG</creator><scope>EVB</scope></search><sort><creationdate>20211022</creationdate><title>Method and device for aligning audio and text, electronic equipment and storage medium</title><author>CHEN CHUANYI ; ZHANG CHAOGANG ; XUAN XIAOGUANG</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_CN113536029A3</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>chi ; eng</language><creationdate>2021</creationdate><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>ELECTRIC DIGITAL DATA PROCESSING</topic><topic>PHYSICS</topic><toplevel>online_resources</toplevel><creatorcontrib>CHEN CHUANYI</creatorcontrib><creatorcontrib>ZHANG CHAOGANG</creatorcontrib><creatorcontrib>XUAN XIAOGUANG</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>CHEN CHUANYI</au><au>ZHANG CHAOGANG</au><au>XUAN XIAOGUANG</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Method and device for aligning audio and text, electronic equipment and storage medium</title><date>2021-10-22</date><risdate>2021</risdate><abstract>The embodiment of the invention provides a method and device for aligning audio and text, electronic equipment and a storage medium. The method comprises the following steps: acquiring a target text and a corresponding target audio; determining a first phoneme corresponding to the word in the target text according to a preset corresponding relationship between the word and the phoneme; according to a first phoneme sequence between the first phonemes, adding a preset phoneme behind each to-be-processed phoneme to obtain a second phoneme; obtaining a target probability of each target audio frame corresponding to each second phoneme based on the spectrum feature of each target audio frame in the target audio and a pre-trained probability prediction model; based on the target probability and a second phoneme sequence between the second phonemes, determining a target phoneme corresponding to each target audio frame from the second phonemes; and according to the embodiment of the invention, determining the text to</abstract><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier
ispartof
issn
language	chi ; eng
recordid	cdi_epo_espacenet_CN113536029A
source	esp@cenet
subjects	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
title	Method and device for aligning audio and text, electronic equipment and storage medium
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T01%3A56%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=CHEN%20CHUANYI&rft.date=2021-10-22&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3ECN113536029A%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true