Method and device for aligning audio and text, electronic equipment and storage medium

The embodiment of the invention provides a method and device for aligning audio and text, electronic equipment and a storage medium. The method comprises the following steps: acquiring a target text and a corresponding target audio; determining a first phoneme corresponding to the word in the target...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: CHEN CHUANYI, ZHANG CHAOGANG, XUAN XIAOGUANG
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The embodiment of the invention provides a method and device for aligning audio and text, electronic equipment and a storage medium. The method comprises the following steps: acquiring a target text and a corresponding target audio; determining a first phoneme corresponding to the word in the target text according to a preset corresponding relationship between the word and the phoneme; according to a first phoneme sequence between the first phonemes, adding a preset phoneme behind each to-be-processed phoneme to obtain a second phoneme; obtaining a target probability of each target audio frame corresponding to each second phoneme based on the spectrum feature of each target audio frame in the target audio and a pre-trained probability prediction model; based on the target probability and a second phoneme sequence between the second phonemes, determining a target phoneme corresponding to each target audio frame from the second phonemes; and according to the embodiment of the invention, determining the text to