Efficient integrated response generation from multiple targets using weighted finite state transducers
In this paper, we describe how language generation and speech synthesis for spoken dialog systems can be efficiently integrated under a weighted finite state transducer architecture. Taking advantage of this efficiency, we show that introducing flexible targets in generation leads to more natural so...
Gespeichert in:
Veröffentlicht in: | Computer speech & language 2002-07, Vol.16 (3), p.533-550 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper, we describe how language generation and speech synthesis for spoken dialog systems can be efficiently integrated under a weighted finite state transducer architecture. Taking advantage of this efficiency, we show that introducing flexible targets in generation leads to more natural sounding synthesis. Specifically, we allow multiple wordings of the response and multiple prosodic realizations of the different wordings. The choice of wording and prosodic structure are then jointly optimized with unit selection for waveform generation in speech synthesis. Results of perceptual experiments show that by integrating the steps of language generation and speech synthesis, we are able to achieve improved naturalness of synthetic speech compared to the sequential implementation. |
---|---|
ISSN: | 0885-2308 1095-8363 |
DOI: | 10.1016/S0885-2308(02)00023-2 |