Perceptual consequences of nasal consonant ‘‘surrogates’’ in English: Implications for speech synthesis

Experiments indicate that non-nasal obstruents in human utterances can be replaced by ‘‘surrogate’’ segments, either produced by formant synthesis or recorded from other speakers, with virtually no change in speech quality or speaker identity [Hertz, Proc. IEEE 2002 Workshop on Speech Synthesis (200...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of the Acoustical Society of America 2004-05, Vol.115 (5_Supplement), p.2543-2543
Hauptverfasser:	Hertz, Susan R., Spence, Isaac C., Church, Thomas F., Goldhor, Richard
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Experiments indicate that non-nasal obstruents in human utterances can be replaced by ‘‘surrogate’’ segments, either produced by formant synthesis or recorded from other speakers, with virtually no change in speech quality or speaker identity [Hertz, Proc. IEEE 2002 Workshop on Speech Synthesis (2002)]. While the durational and spectral properties of the surrogate segments must be broadly appropriate to their target context, no speaker-specific tailoring is required. This paper describes follow-on experiments studying the perceptual consequences of replacing nasal consonants in human utterances with surrogate segments from different phonetic contexts, either synthesized or spoken by other speakers. These experiments indicate that the manipulated speech sounds natural when surrogate segment durations, and the formant transitions and nasalization characteristics of adjacent vowels, are appropriate. In certain contexts F0 is also perceptually salient. The spectral characteristics of surrogate nasal murmurs are often unimportant. In many cases, the perceived speech quality, phoneme identity, and speaker identity are unaffected even by a surrogate from a phoneme differing from the original. This paper highlights the perceptual results and explains their relevance to hybrid synthesis techniques that employ cross-speaker waveform concatenation and/or integrate waveform concatenation with formant synthesis. Utterances that exemplify these results will be played.
ISSN:	0001-4966 1520-8524
DOI:	10.1121/1.4783633