Statistical Voice Conversion Techniques for Body-Conducted Unvoiced Speech Enhancement
In this paper, we present statistical approaches to enhance body-conducted unvoiced speech for silent speech communication. A body-conductive microphone called nonaudible murmur (NAM) microphone is effectively used to detect very soft unvoiced speech such as NAM or a whispered voice while keeping sp...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on audio, speech, and language processing speech, and language processing, 2012-11, Vol.20 (9), p.2505-2517 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper, we present statistical approaches to enhance body-conducted unvoiced speech for silent speech communication. A body-conductive microphone called nonaudible murmur (NAM) microphone is effectively used to detect very soft unvoiced speech such as NAM or a whispered voice while keeping speech sounds emitted outside almost inaudible. However, body-conducted unvoiced speech is difficult to use in human-to-human speech communication because it sounds unnatural and less intelligible owing to the acoustic change caused by body conduction. To address this issue, voice conversion (VC) methods from NAM to normal speech (NAM-to-Speech) and to a whispered voice (NAM-to-Whisper) are proposed, where the acoustic features of body-conducted unvoiced speech are converted into those of natural voices in a probabilistic manner using Gaussian mixture models (GMMs). Moreover, these methods are extended to convert not only NAM but also a body-conducted whispered voice (BCW) as another type of body-conducted unvoiced speech. Several experimental evaluations are conducted to demonstrate the effectiveness of the proposed methods. The experimental results show that 1) NAM-to-Speech effectively improves intelligibility but it causes degradation of naturalness owing to the difficulty of estimating natural fundamental frequency contours from unvoiced speech; 2) NAM-to-Whisper significantly outperforms NAM-to-Speech in terms of both intelligibility and naturalness; and 3) a single conversion model capable of converting both NAM and BCW is effectively developed in our proposed VC methods. |
---|---|
ISSN: | 1558-7916 2329-9290 1558-7924 2329-9304 |
DOI: | 10.1109/TASL.2012.2205241 |