Method and device for aligning audio and text, electronic equipment and storage medium

The embodiment of the invention provides a method and device for aligning audio and text, electronic equipment and a storage medium. The method comprises the following steps: acquiring a target text and a corresponding target audio; determining a first phoneme corresponding to the word in the target...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: CHEN CHUANYI, ZHANG CHAOGANG, XUAN XIAOGUANG
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator CHEN CHUANYI
ZHANG CHAOGANG
XUAN XIAOGUANG
description The embodiment of the invention provides a method and device for aligning audio and text, electronic equipment and a storage medium. The method comprises the following steps: acquiring a target text and a corresponding target audio; determining a first phoneme corresponding to the word in the target text according to a preset corresponding relationship between the word and the phoneme; according to a first phoneme sequence between the first phonemes, adding a preset phoneme behind each to-be-processed phoneme to obtain a second phoneme; obtaining a target probability of each target audio frame corresponding to each second phoneme based on the spectrum feature of each target audio frame in the target audio and a pre-trained probability prediction model; based on the target probability and a second phoneme sequence between the second phonemes, determining a target phoneme corresponding to each target audio frame from the second phonemes; and according to the embodiment of the invention, determining the text to
format Patent
fullrecord <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_CN113536029A</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>CN113536029A</sourcerecordid><originalsourceid>FETCH-epo_espacenet_CN113536029A3</originalsourceid><addsrcrecordid>eNqNyr0KwjAUhuEuDqLew3FXsBYFRymKi07iWkLytR7In8mJePlC8QKc3uF5p9XjCnkGQ8obMnizBvUhkbI8ePYDqWI4jCr4yIpgoSUFz5rwKhwdvIycJSQ1gBwMFzevJr2yGYtfZ9XyfLq3lzVi6JCj0vCQrr3VdbNr9pvt4dj883wB2x85LA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Method and device for aligning audio and text, electronic equipment and storage medium</title><source>esp@cenet</source><creator>CHEN CHUANYI ; ZHANG CHAOGANG ; XUAN XIAOGUANG</creator><creatorcontrib>CHEN CHUANYI ; ZHANG CHAOGANG ; XUAN XIAOGUANG</creatorcontrib><description>The embodiment of the invention provides a method and device for aligning audio and text, electronic equipment and a storage medium. The method comprises the following steps: acquiring a target text and a corresponding target audio; determining a first phoneme corresponding to the word in the target text according to a preset corresponding relationship between the word and the phoneme; according to a first phoneme sequence between the first phonemes, adding a preset phoneme behind each to-be-processed phoneme to obtain a second phoneme; obtaining a target probability of each target audio frame corresponding to each second phoneme based on the spectrum feature of each target audio frame in the target audio and a pre-trained probability prediction model; based on the target probability and a second phoneme sequence between the second phonemes, determining a target phoneme corresponding to each target audio frame from the second phonemes; and according to the embodiment of the invention, determining the text to</description><language>chi ; eng</language><subject>CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; COUNTING ; ELECTRIC DIGITAL DATA PROCESSING ; PHYSICS</subject><creationdate>2021</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20211022&amp;DB=EPODOC&amp;CC=CN&amp;NR=113536029A$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,776,881,25544,76293</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20211022&amp;DB=EPODOC&amp;CC=CN&amp;NR=113536029A$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>CHEN CHUANYI</creatorcontrib><creatorcontrib>ZHANG CHAOGANG</creatorcontrib><creatorcontrib>XUAN XIAOGUANG</creatorcontrib><title>Method and device for aligning audio and text, electronic equipment and storage medium</title><description>The embodiment of the invention provides a method and device for aligning audio and text, electronic equipment and a storage medium. The method comprises the following steps: acquiring a target text and a corresponding target audio; determining a first phoneme corresponding to the word in the target text according to a preset corresponding relationship between the word and the phoneme; according to a first phoneme sequence between the first phonemes, adding a preset phoneme behind each to-be-processed phoneme to obtain a second phoneme; obtaining a target probability of each target audio frame corresponding to each second phoneme based on the spectrum feature of each target audio frame in the target audio and a pre-trained probability prediction model; based on the target probability and a second phoneme sequence between the second phonemes, determining a target phoneme corresponding to each target audio frame from the second phonemes; and according to the embodiment of the invention, determining the text to</description><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>ELECTRIC DIGITAL DATA PROCESSING</subject><subject>PHYSICS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2021</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNqNyr0KwjAUhuEuDqLew3FXsBYFRymKi07iWkLytR7In8mJePlC8QKc3uF5p9XjCnkGQ8obMnizBvUhkbI8ePYDqWI4jCr4yIpgoSUFz5rwKhwdvIycJSQ1gBwMFzevJr2yGYtfZ9XyfLq3lzVi6JCj0vCQrr3VdbNr9pvt4dj883wB2x85LA</recordid><startdate>20211022</startdate><enddate>20211022</enddate><creator>CHEN CHUANYI</creator><creator>ZHANG CHAOGANG</creator><creator>XUAN XIAOGUANG</creator><scope>EVB</scope></search><sort><creationdate>20211022</creationdate><title>Method and device for aligning audio and text, electronic equipment and storage medium</title><author>CHEN CHUANYI ; ZHANG CHAOGANG ; XUAN XIAOGUANG</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_CN113536029A3</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>chi ; eng</language><creationdate>2021</creationdate><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>ELECTRIC DIGITAL DATA PROCESSING</topic><topic>PHYSICS</topic><toplevel>online_resources</toplevel><creatorcontrib>CHEN CHUANYI</creatorcontrib><creatorcontrib>ZHANG CHAOGANG</creatorcontrib><creatorcontrib>XUAN XIAOGUANG</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>CHEN CHUANYI</au><au>ZHANG CHAOGANG</au><au>XUAN XIAOGUANG</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Method and device for aligning audio and text, electronic equipment and storage medium</title><date>2021-10-22</date><risdate>2021</risdate><abstract>The embodiment of the invention provides a method and device for aligning audio and text, electronic equipment and a storage medium. The method comprises the following steps: acquiring a target text and a corresponding target audio; determining a first phoneme corresponding to the word in the target text according to a preset corresponding relationship between the word and the phoneme; according to a first phoneme sequence between the first phonemes, adding a preset phoneme behind each to-be-processed phoneme to obtain a second phoneme; obtaining a target probability of each target audio frame corresponding to each second phoneme based on the spectrum feature of each target audio frame in the target audio and a pre-trained probability prediction model; based on the target probability and a second phoneme sequence between the second phonemes, determining a target phoneme corresponding to each target audio frame from the second phonemes; and according to the embodiment of the invention, determining the text to</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language chi ; eng
recordid cdi_epo_espacenet_CN113536029A
source esp@cenet
subjects CALCULATING
COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
COMPUTING
COUNTING
ELECTRIC DIGITAL DATA PROCESSING
PHYSICS
title Method and device for aligning audio and text, electronic equipment and storage medium
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T01%3A56%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=CHEN%20CHUANYI&rft.date=2021-10-22&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3ECN113536029A%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true