Multi-stream spectral representation for statistical parametric speech synthesis

A Text To Speech (TTS) synthesizer receives text input (15, fig. 1), converts the text into linguistic units (eg. phonemes or graphemes) and further converts these into candidate speech vectors by modelling higher and lower spectral frequencies of the speech data as separate high and low spectral st...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Yannis Stylianou, Kayoko Yanagisawa, Ranniery Da Silva Maia
Format:	Patent
Sprache:	eng
Schlagworte:	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A Text To Speech (TTS) synthesizer receives text input (15, fig. 1), converts the text into linguistic units (eg. phonemes or graphemes) and further converts these into candidate speech vectors by modelling higher and lower spectral frequencies of the speech data as separate high and low spectral streams by applying different statistical models (eg. Deep Neural Networks DNN, or decision trees following a Hidden Markov Model, HMM) to the lower and higher spectral frequencies respectively. One set of statistical models may be fitted more tightly to one of the streams (eg. the lower frequency stream). The speech may be modelled as an excitation signal with fundamental frequency f0 and band aperiodicity bap as separate streams, and Mel-scaled line spectral pairs (MSP) parameterization may be employed.