Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech

► We convert electrolaryngeal (EL) speech to normal speech using voice conversion. ► Voice conversion (VC) enhances naturalness of EL speech. ► VC also suppresses high powers of radiated sound source signals. ► An air-pressure sensor is effective to estimate more reasonable F 0 contours. An electrol...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Speech communication 2012, Vol.54 (1), p.134-146
Hauptverfasser: Nakamura, Keigo, Toda, Tomoki, Saruwatari, Hiroshi, Shikano, Kiyohiro
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:► We convert electrolaryngeal (EL) speech to normal speech using voice conversion. ► Voice conversion (VC) enhances naturalness of EL speech. ► VC also suppresses high powers of radiated sound source signals. ► An air-pressure sensor is effective to estimate more reasonable F 0 contours. An electrolarynx (EL) is a medical device that generates sound source signals to provide laryngectomees with a voice. In this article we focus on two problems of speech produced with an EL (EL speech). One problem is that EL speech is extremely unnatural and the other is that sound source signals with high energy are generated by an EL, and therefore, the signals often annoy surrounding people. To address these two problems, in this article we propose three speaking-aid systems that enhance three different types of EL speech signals: EL speech, EL speech using an air-pressure sensor (EL-air speech), and silent EL speech. The air-pressure sensor enables a laryngectomee to manipulate the F 0 contours of EL speech using exhaled air that flows from the tracheostoma. Silent EL speech is produced with a new sound source unit that generates signals with extremely low energy. Our speaking-aid systems address the poor quality of EL speech using voice conversion (VC), which transforms acoustic features so that it appears as if the speech is uttered by another person. Our systems estimate spectral parameters, F 0, and aperiodic components independently. The result of experimental evaluations demonstrates that the use of an air-pressure sensor dramatically improves F 0 estimation accuracy. Moreover, it is revealed that the converted speech signals are preferred to source EL speech.
ISSN:0167-6393
1872-7182
DOI:10.1016/j.specom.2011.07.007