Waveguide physical modeling of vocal tract acoustics: flexible formant bandwidth control from increased model dimensionality

Digital waveguide physical modeling is often used as an efficient representation of acoustical resonators such as the human vocal tract. Building on the basic one-dimensional (1-D) Kelly-Lochbaum tract model, various speech synthesis techniques demonstrate improvements to the wave scattering mechani...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on audio, speech, and language processing speech, and language processing, 2006-05, Vol.14 (3), p.964-971
Hauptverfasser:	Mullen, J., Howard, D.M., Murphy, D.T.
Format:	Artikel
Sprache:	eng
Schlagworte:	Acoustic propagation Acoustic resonators Acoustic scattering Acoustic waveguides Acoustic waves Acoustics Applied sciences Bandwidth Exact sciences and technology Human voice Impedance Information, signal and communications theory Natural languages Signal processing Speech Speech processing Speech synthesis Studies Telecommunications and information theory Two dimensional vocal system Vocal tract Vowels Wave propagation Waveguide components Waveguides
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Digital waveguide physical modeling is often used as an efficient representation of acoustical resonators such as the human vocal tract. Building on the basic one-dimensional (1-D) Kelly-Lochbaum tract model, various speech synthesis techniques demonstrate improvements to the wave scattering mechanisms in order to better approximate wave propagation in the complex vocal system. Some of these techniques are discussed in this paper, with particular reference to an alternative approach in the form of a two-dimensional waveguide mesh model. Emphasis is placed on its ability to produce vowel spectra similar to that which would be present in natural speech, and how it improves upon the 1-D model. Tract area function is accommodated as model width, rather than translated into acoustic impedance, and as such offers extra control as an additional bounding limit to the model. Results show that the two-dimensional (2-D) model introduces approximately linear control over formant bandwidths leading to attainable realistic values across a range of vowels. Similarly, the 2-D model allows for application of theoretical reflection values within the tract, which when applied to the 1-D model result in small formant bandwidths, and, hence, unnatural sounding synthesized vowels.
ISSN:	1558-7916 2329-9290 1558-7924 2329-9304
DOI:	10.1109/TSA.2005.858052