Speaker and Noise Factorization for Robust Speech Recognition

Speech recognition systems need to operate in a wide range of conditions. Thus they should be robust to extrinsic variability caused by various acoustic factors, for example speaker differences, transmission channel and background noise. For many scenarios, multiple factors simultaneously impact the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on audio, speech, and language processing speech, and language processing, 2012-09, Vol.20 (7), p.2149-2158
Hauptverfasser: Yongqiang Wang, Gales, M. J. F.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Speech recognition systems need to operate in a wide range of conditions. Thus they should be robust to extrinsic variability caused by various acoustic factors, for example speaker differences, transmission channel and background noise. For many scenarios, multiple factors simultaneously impact the underlying "clean" speech signal. This paper examines techniques to handle both speaker and background noise differences. An acoustic factorization approach is adopted. Here, separate transforms are assigned to represent the speaker [maximum-likelihood linear regression (MLLR)], and noise and channel [model-based vector Taylor series (VTS)] factors. This is a highly flexible framework compared to the standard approaches of modeling the combined impact of both speaker and noise factors. For example factorization allows the speaker characteristics obtained in one noise condition to be applied to a different environment. To obtain this factorization modified versions of MLLR and VTS training and application are derived. The proposed scheme is evaluated for both adaptation and factorization on the AURORA4 data.
ISSN:1558-7916
2329-9290
1558-7924
2329-9304
DOI:10.1109/TASL.2012.2198059