Statistical Voice Conversion Based on Noisy Channel Model

This paper describes a novel framework of voice conversion effectively using both a joint density model and a speaker model. In voice conversion studies, approaches based on the Gaussian mixture model (GMM) with probabilistic densities of joint vectors of a source and a target speakers are widely us...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on audio, speech, and language processing speech, and language processing, 2012-08, Vol.20 (6), p.1784-1794
Hauptverfasser:	Saito, D., Watanabe, S., Nakamura, A., Minematsu, N.
Format:	Artikel
Sprache:	eng
Schlagworte:	Applied sciences Channel models Channels Conversion Density Detection, estimation, filtering, equalization, prediction Exact sciences and technology Gaussian Information, signal and communications theory Joint density model Joints Linguistics Mathematical models Noise measurement noisy channel model Pragmatics probabilistic integration Signal and communications theory Signal processing Signal, noise speaker model Speech Speech enhancement Speech processing Telecommunications and information theory Trains Vectors Voice voice conversion (VC)
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper describes a novel framework of voice conversion effectively using both a joint density model and a speaker model. In voice conversion studies, approaches based on the Gaussian mixture model (GMM) with probabilistic densities of joint vectors of a source and a target speakers are widely used to estimate a transform function between both the speakers. However, to achieve sufficient quality, these approaches require a parallel corpus which contains plenty of utterances with the same linguistic content spoken by both the speakers. In addition, the joint density GMM methods often suffer from overtraining effects when the amount of training data is small. To compensate for these problems, we propose a voice conversion framework, which integrates the speaker GMM of the target with the joint density model using a noisy channel model. The proposed method trains the joint density model with a few parallel utterances, and the speaker model with nonparallel data of the target, independently. It can ease the burden on the source speaker. Experiments demonstrate the effectiveness of the proposed method, especially when the amount of the parallel corpus is small.
ISSN:	1558-7916 2329-9290 1558-7924 2329-9304
DOI:	10.1109/TASL.2012.2188628