Sparse Auditory Reproducing Kernel (SPARK) Features for Noise-Robust Speech Recognition

In this paper, we present a novel speech feature extraction algorithm based on a hierarchical combination of auditory similarity and pooling functions. The computationally efficient features known as "Sparse Auditory Reproducing Kernel" (SPARK) coefficients are extracted under the hypothes...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on audio, speech, and language processing speech, and language processing, 2012-05, Vol.20 (4), p.1362-1371
Hauptverfasser:	Fazel, A., Chakrabartty, S.
Format:	Artikel
Sprache:	eng
Schlagworte:	Applied sciences Auditory HMAX Detection, estimation, filtering, equalization, prediction Exact sciences and technology Feature extraction gammatone functions Information, signal and communications theory Kernel Psychoacoustic models reproducing kernel Hilbert space (RKHS) robust speech recognition Signal and communications theory Signal processing Signal, noise Sparks sparse features Speech Speech processing Speech recognition Studies Telecommunications and information theory Vectors
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, we present a novel speech feature extraction algorithm based on a hierarchical combination of auditory similarity and pooling functions. The computationally efficient features known as "Sparse Auditory Reproducing Kernel" (SPARK) coefficients are extracted under the hypothesis that the noise-robust information in speech signal is embedded in a reproducing kernel Hilbert space (RKHS) spanned by overcomplete, nonlinear, and time-shifted gammatone basis functions. The feature extraction algorithm first involves computing kernel based similarity between the speech signal and the time-shifted gammatone functions, followed by feature pruning using a simple pooling technique ("MAX" operation). In this paper, we describe the effect of different hyper-parameters and kernel functions on the performance of a SPARK based speech recognizer. Experimental results based on the standard AURORA2 dataset demonstrate that the SPARK based speech recognizer delivers consistent improvements in word-accuracy when compared with a baseline speech recognizer trained using the standard ETSI STQ WI008 DSR features.
ISSN:	1558-7916 2329-9290 1558-7924 2329-9304
DOI:	10.1109/TASL.2011.2179294