A DNN Based Speech Enhancement Approach to Noise Robust Acoustic-to-Articulatory Inversion

In this work, we investigate the problem of speaker independent acoustic-to-articulatory inversion (AAI) in noisy condition within the deep neural network (DNN) framework. We claim that DNN vector-to-vector regression for speech enhancement (DNN-SE) can play a key role in AAI when used in a front-en...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Sabzi Shahrebabaki, Abdolreza, Siniscalchi, Sabato Marco, Salvi, Giampiero, Svendsen, Torbjørn Karl
Format:	Buch
Sprache:	eng
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this work, we investigate the problem of speaker independent acoustic-to-articulatory inversion (AAI) in noisy condition within the deep neural network (DNN) framework. We claim that DNN vector-to-vector regression for speech enhancement (DNN-SE) can play a key role in AAI when used in a front-end stage to enhance speech features before AAI backend processing. Our claim contrasts recent literature reporting a drop in AAI accuracy on MMSE enhanced data and thereby sheds some light on the opportunities offered by DNN-SE in robust speech applications. We have also tested single- and multitask training strategies of the DNN-SE block and experimentally found the latter to be beneficial to AAI. Moreover, DNN-SE coupled with an AAI deep system tested on enhanced speech can outperform a multi-condition AAI deep system tested on noisy speech. We assess our approach on the Haskins corpus using the Pearson's correlation coefficient (PCC). A 15% relative PCC improvement is observed over a multi-condition AAI system at 0dB signal-to-noise ratio (SNR). Our approach also compares favorably against using a conventional DSP approach, namely MMSE with IMCRA, in the front-end stage.