A Statistical Approach to Mel-Domain Mask Estimation for Missing-Feature ASR

In this letter, we present a statistical approach to Mel-domain mask estimation for missing feature (MF)-based automatic speech recognition (ASR). Mel-domain time-frequency masks are of interest, since MF systems have been shown successful in that domain. Time- and channel-specific reliability measu...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE signal processing letters 2010-11, Vol.17 (11), p.941-944
Hauptverfasser:	Borgström, Bengt J, Alwan, Abeer
Format:	Artikel
Sprache:	eng
Schlagworte:	chi ^{2} random variables Decoding Degrees of freedom Estimation Exact solutions mask estimation Masks Mathematical analysis missing features Noise noise robust ASR Spectra Speech speech presence uncertainty Speech recognition Temporal logic Time frequency analysis Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this letter, we present a statistical approach to Mel-domain mask estimation for missing feature (MF)-based automatic speech recognition (ASR). Mel-domain time-frequency masks are of interest, since MF systems have been shown successful in that domain. Time- and channel-specific reliability measures are derived as posterior probabilities of active speech using a 2-state speech model. Since closed form distributions for Mel-domain spectra do not exist, they are instead modeled as χ 2 processes with empirically-determined degrees of freedom. Additionally, we present HMM-based decoding to exploit temporal correlation of spectral speech data. The proposed mask estimation algorithm is integrated with an example MF-based ASR front-end from, and is shown to outperform the spectral subtraction (SS)-based method from in terms of word-accuracy, when applied to the Aurora-2 database.
ISSN:	1070-9908 1558-2361
DOI:	10.1109/LSP.2010.2076348