A Statistical Approach to Mel-Domain Mask Estimation for Missing-Feature ASR

In this letter, we present a statistical approach to Mel-domain mask estimation for missing feature (MF)-based automatic speech recognition (ASR). Mel-domain time-frequency masks are of interest, since MF systems have been shown successful in that domain. Time- and channel-specific reliability measu...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE signal processing letters 2010-11, Vol.17 (11), p.941-944
Hauptverfasser: Borgström, Bengt J, Alwan, Abeer
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this letter, we present a statistical approach to Mel-domain mask estimation for missing feature (MF)-based automatic speech recognition (ASR). Mel-domain time-frequency masks are of interest, since MF systems have been shown successful in that domain. Time- and channel-specific reliability measures are derived as posterior probabilities of active speech using a 2-state speech model. Since closed form distributions for Mel-domain spectra do not exist, they are instead modeled as χ 2 processes with empirically-determined degrees of freedom. Additionally, we present HMM-based decoding to exploit temporal correlation of spectral speech data. The proposed mask estimation algorithm is integrated with an example MF-based ASR front-end from, and is shown to outperform the spectral subtraction (SS)-based method from in terms of word-accuracy, when applied to the Aurora-2 database.
ISSN:1070-9908
1558-2361
DOI:10.1109/LSP.2010.2076348