DCT-Based Amplitude and Frequency Modulated Harmonic-Plus-Noise Modelling for Text-to-Speech Synthesis

We present a harmonic-plus-noise modelling (HNM) strategy in the context of corpus-based text-to-speech (TTS) synthesis, in which whole speech phonemes are modelled in their integrity, contrary to the traditional frame-based approach. The pitch and amplitude trajectories of each phoneme are modelled...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Hermus, K., Van Hamme, H., Verhelst, W., Irhimeh, S., De Moortel, J.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We present a harmonic-plus-noise modelling (HNM) strategy in the context of corpus-based text-to-speech (TTS) synthesis, in which whole speech phonemes are modelled in their integrity, contrary to the traditional frame-based approach. The pitch and amplitude trajectories of each phoneme are modelled with a low-order DCT expansion. The parameter analysis algorithm is to a large extent aided and guided by the pitch contours, and by the phonetic annotation and segmentation information that is available in any TTS system. The major advantages of our model are: few parameter interpolation points during synthesis (one per phoneme), flexible time and pitch modifications, and a reduction in the number of model parameters which is favourable for low bit rate coding in TTS for embedded applications. Listening tests on TTS sentences have shown that very natural speech can be obtained, despite the compactness of the signal representation.
ISSN:1520-6149
2379-190X
DOI:10.1109/ICASSP.2007.367005