Bone-conducted speech enhancement using deep denoising autoencoder

Bone-conduction microphones (BCMs) capture speech signals based on the vibrations of the speaker's skull and exhibit better noise-resistance capabilities than normal air-conduction microphones (ACMs) when transmitting speech signals. Because BCMs only capture the low-frequency portion of speech...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Speech communication 2018-11, Vol.104, p.106-112
Hauptverfasser:	Liu, Hung-Ping, Tsao, Yu, Fuh, Chiou-Shann
Format:	Artikel
Sprache:	eng
Schlagworte:	Automatic speech recognition Bone-conduction microphone Deep denoising autoencoder Denoising Frequencies Frequency response Intelligibility Microphones Noise Noise reduction Sound pressure Speech Speech enhancement Speech processing Speech recognition Voice recognition
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Bone-conduction microphones (BCMs) capture speech signals based on the vibrations of the speaker's skull and exhibit better noise-resistance capabilities than normal air-conduction microphones (ACMs) when transmitting speech signals. Because BCMs only capture the low-frequency portion of speech signals, their frequency response is quite different from that of ACMs. When replacing an ACM with a BCM, we may obtain satisfactory results with respect to noise suppression, but the speech quality and intelligibility may be degraded due to the nature of the solid vibration. The mismatched characteristics of BCM and ACM can also impact the automatic speech recognition (ASR) performance, and it is infeasible to recreate a new ASR system using the voice data from BCMs. In this study, we propose a novel deep-denoising autoencoder (DDAE) approach to bridge BCM and ACM in order to improve speech quality and intelligibility, and the current ASR could be employed directly without recreating a new system. Experimental results first demonstrated that the DDAE approach can effectively improve speech quality and intelligibility based on standardized evaluation metrics. Moreover, our proposed system can significantly improve the ASR performance by a notable 48.28% relative character error rate (CER) reduction (from 14.50% to 7.50%) under quiet conditions. In an actual noisy environment (sound pressure from 61.7 dBA to 73.9 dBA), our proposed system with a BCM outperforms an ACM, yielding an 84.46% reduction in the relative CER (proposed system: 9.13% and ACM: 58.75%).
ISSN:	0167-6393 1872-7182
DOI:	10.1016/j.specom.2018.06.002