Interpretable parametric voice conversion functions based on Gaussian mixture models and constrained transformations

•New voice conversion functions based on bilinear frequency warping and constrained amplitude scaling.•Good overall conversion performance using more intuitive and informative parameters.•Useful as an analysis tool to visualize spectral differences between different voices or styles. Voice conversio...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computer speech & language 2015-03, Vol.30 (1), p.3-15
Hauptverfasser: Erro, Daniel, Alonso, Agustin, Serrano, Luis, Navas, Eva, Hernaez, Inma
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•New voice conversion functions based on bilinear frequency warping and constrained amplitude scaling.•Good overall conversion performance using more intuitive and informative parameters.•Useful as an analysis tool to visualize spectral differences between different voices or styles. Voice conversion functions based on Gaussian mixture models and parametric speech signal representations are opaque in the sense that it is not straightforward to interpret the physical meaning of the conversion parameters. Following the line of recent works based on the frequency warping plus amplitude scaling paradigm, in this article we show that voice conversion functions can be designed according to physically meaningful constraints in such manner that they become highly informative. The resulting voice conversion method can be used to visualize the differences between source and target voices or styles in terms of formant location in frequency, spectral tilt and amplitude in a number of spectral bands.
ISSN:0885-2308
1095-8363
DOI:10.1016/j.csl.2014.03.001