Feature Mapping of Multiple Beamformed Sources for Robust Overlapping Speech Recognition Using a Microphone Array

This paper introduces a nonlinear vector-based feature mapping approach to extract robust features for automatic speech recognition (ASR) of overlapping speech using a microphone array. We explore different configurations and additional sources of information to improve the effectiveness of the feat...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2014-12, Vol.22 (12), p.2244-2255
Hauptverfasser:	Weifeng Li, Longbiao Wang, Yicong Zhou, Dines, John, Magimai-Doss, Mathew, Bourlard, Herve, Qingmin Liao
Format:	Artikel
Sprache:	eng
Schlagworte:	Acoustic mapping Arrays Beamforming Energy use Feature extraction Mapping microphone array Microphones Natural language processing neural network Nonlinearity Speech Speech enhancement Speech recognition speech separation Vectors
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper introduces a nonlinear vector-based feature mapping approach to extract robust features for automatic speech recognition (ASR) of overlapping speech using a microphone array. We explore different configurations and additional sources of information to improve the effectiveness of the feature mapping. First, we investigate the full-vector based mapping of different sources in a log mel-filterbank energy (log MFBE) domain, and demonstrate that retraining the acoustic model using the generated training data can help improve the recognition performance. Then we investigate the feature mapping between different domains. Finally in order to improve the qualities of the mapping inputs we propose a nonlinear mapping of the features from multiple beamformed sources, which are directed at the target and interfering speakers, respectively. We demonstrate the effectiveness of the proposed approach through extensive evaluations on the MONC corpus, which includes non-overlapping single speaker and overlapping multi-speaker conditions.
ISSN:	2329-9290 2329-9304
DOI:	10.1109/TASLP.2014.2364130