Acoustic features for speech recognition based on Gammatone filterbank and instantaneous frequency

Most of the features used by modern automatic speech recognition systems, such as mel-frequency cepstral coefficients (MFCC) and perceptual linear predictive (PLP) coefficients, represent spectral envelope of the speech signal only. Nevertheless, phase or frequency modulation as represented in recen...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Speech communication 2011-05, Vol.53 (5), p.707-715
Hauptverfasser:	Yin, Hui, Hohmann, Volker, Nadeu, Climent
Format:	Artikel
Sprache:	eng
Schlagworte:	Cleaning Envelopes Frequency modulation Gammatone filterbank Instantaneous frequency Mathematical models Recognition Speech Speech recognition Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Most of the features used by modern automatic speech recognition systems, such as mel-frequency cepstral coefficients (MFCC) and perceptual linear predictive (PLP) coefficients, represent spectral envelope of the speech signal only. Nevertheless, phase or frequency modulation as represented in recent perceptual models of the peripheral auditory system might also contribute to speech decoding. Furthermore, such features can be complementary to the envelope features. This paper proposes a variety of features based on a linear auditory filterbank, the Gammatone filterbank. Envelope features are derived from the envelope of the subband filter outputs. Phase/frequency modulation is represented by the subband instantaneous frequency (IF) and is used explicitly by concatenating envelope-based and IF-based features or is used implicitly by IF-based frequency reassignment. Speech recognition experiments using a standard HMM-based recognizer under both clean training and multi-condition training are conducted on a Chinese mandarin digits corpus. The experimental results show that the proposed envelope and phase based features can improve recognition rates in clean and noisy conditions compared to the reference MFCC-based recognizer.
ISSN:	0167-6393 1872-7182
DOI:	10.1016/j.specom.2010.04.008