A comparison of signal processing front ends for automatic word recognition

This paper compares the word error rate of a speech recognizer using several signal processing front ends based on auditory properties. Front ends were compared with a control mel filter bank (MFB) based cepstral front end in clean speech and with speech degraded by noise and spectral variability, u...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on speech and audio processing 1995-07, Vol.3 (4), p.286-293
Hauptverfasser: Jankowski, C.R., Vo, H.-D.H., Lippmann, R.P.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper compares the word error rate of a speech recognizer using several signal processing front ends based on auditory properties. Front ends were compared with a control mel filter bank (MFB) based cepstral front end in clean speech and with speech degraded by noise and spectral variability, using the TI-105 isolated word database. MFB recognition error rates ranged from 0.5 to 26.9% in noise, depending on the SNR, and auditory models provided error rates as much as four percentage points lower. With speech degraded by linear filtering, MFB error rates ranged from 0.5 to 3.1%, and the reduction in error rates provided by auditory models was less than 0.5 percentage points. Some earlier studies that demonstrated considerably more improvement with auditory models used linear predictive coding (LPC) based control front ends. This paper shows that MFB cepstra significantly outperform LPC cepstra under noisy conditions. Techniques using an optimal linear combination of features for data reduction were also evaluated.< >
ISSN:1063-6676
1558-2353
DOI:10.1109/89.397093