Speaker and Noise Factorization for Robust Speech Recognition
Speech recognition systems need to operate in a wide range of conditions. Thus they should be robust to extrinsic variability caused by various acoustic factors, for example speaker differences, transmission channel and background noise. For many scenarios, multiple factors simultaneously impact the...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on audio, speech, and language processing speech, and language processing, 2012-09, Vol.20 (7), p.2149-2158 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Speech recognition systems need to operate in a wide range of conditions. Thus they should be robust to extrinsic variability caused by various acoustic factors, for example speaker differences, transmission channel and background noise. For many scenarios, multiple factors simultaneously impact the underlying "clean" speech signal. This paper examines techniques to handle both speaker and background noise differences. An acoustic factorization approach is adopted. Here, separate transforms are assigned to represent the speaker [maximum-likelihood linear regression (MLLR)], and noise and channel [model-based vector Taylor series (VTS)] factors. This is a highly flexible framework compared to the standard approaches of modeling the combined impact of both speaker and noise factors. For example factorization allows the speaker characteristics obtained in one noise condition to be applied to a different environment. To obtain this factorization modified versions of MLLR and VTS training and application are derived. The proposed scheme is evaluated for both adaptation and factorization on the AURORA4 data. |
---|---|
ISSN: | 1558-7916 2329-9290 1558-7924 2329-9304 |
DOI: | 10.1109/TASL.2012.2198059 |