IBM Voice Conversion Systems for 2007 TC-STAR Evaluation

This paper proposes a novel voice conversion method by frequency warping. The frequency warping function is generated based on mapping formants of the source speaker and the target speaker. In addition to frequency warping, fundamental frequency adjustment, spectral envelope equalization, breathines...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Tsinghua science and technology 2008-08, Vol.13 (4), p.510-514
Hauptverfasser: Shuang, Zhiwei, Bakis, Raimo, Qin, Yong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper proposes a novel voice conversion method by frequency warping. The frequency warping function is generated based on mapping formants of the source speaker and the target speaker. In addition to frequency warping, fundamental frequency adjustment, spectral envelope equalization, breathiness addition, and duration modification are also used to improve the similarity to the target speaker. The proposed voice conversion method needs only a very small amount of training data for generating the warping function, thereby greatly facilitating its application. Systems based on the proposed method were used for the 2007 TC-STAR intra-lingual voice conversion evaluation for English and Spanish and a cross-lingual voice conversion evaluation for Spanish. The evaluation results show that the proposed method can achieve a much better quality of converted speech than other methods as well as a good balance between quality and similarity. The IBM1 system was ranked No. 1 for English evaluation and No. 2 for Spanish evaluation. Evaluation results also show that the proposed method is a convenient and competitive method for cross-lingual voice conversion tasks.
ISSN:1007-0214
1878-7606
1007-0214
DOI:10.1016/S1007-0214(08)70081-6