Noise-robust speech recognition using multi-band spectral features

In most of the state-of-the-art automatic speech recognition (ASR) systems, speech is converted into a time function of the MFCC (Mel Frequency Cepstrum Coefficient) vector. However, the problem with using the MFCC is that noise effects spread over all the coefficients even when the noise is limited...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of the Acoustical Society of America 2004-10, Vol.116 (4_Supplement), p.2480-2480
Hauptverfasser: Nishimura, Yoshitaka, Shinozaki, Takahiro, Iwano, Koji, Furui, Sadaoki
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In most of the state-of-the-art automatic speech recognition (ASR) systems, speech is converted into a time function of the MFCC (Mel Frequency Cepstrum Coefficient) vector. However, the problem with using the MFCC is that noise effects spread over all the coefficients even when the noise is limited within a narrow frequency band. If a spectrum feature is directly used, such a problem can be avoided and thus robustness against noise could be expected to increase. Although various researches on using spectral domain features have been conducted, improvement of recognition performances has been reported only in limited noise conditions. This paper proposes a novel multi-band ASR method using a new log-spectral domain feature. In order to increase the robustness, log-spectrum features are normalized by applying the three processes: subtracting the mean log-energy for each frame, emphasizing spectral peaks, and subtracting the log-spectral mean averaged over an utterance. Spectral component likelihood values in each frame are weighted by normalized spectral level (spectral peaks) or SNR of each component. Experimental results using babble noise-added speech show that recognition performance is improved by the proposed method in comparison with the MFCC-based method. The performance is further improved by spectral-peak weighting and SNR-based frequency-band weighting techniques.
ISSN:0001-4966
1520-8524
DOI:10.1121/1.4784906