Noise-robust speech recognition using multi-band spectral features
In most of the state-of-the-art automatic speech recognition (ASR) systems, speech is converted into a time function of the MFCC (Mel Frequency Cepstrum Coefficient) vector. However, the problem with using the MFCC is that noise effects spread over all the coefficients even when the noise is limited...
Gespeichert in:
Veröffentlicht in: | The Journal of the Acoustical Society of America 2004-10, Vol.116 (4_Supplement), p.2480-2480 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In most of the state-of-the-art automatic speech recognition (ASR) systems, speech is converted into a time function of the MFCC (Mel Frequency Cepstrum Coefficient) vector. However, the problem with using the MFCC is that noise effects spread over all the coefficients even when the noise is limited within a narrow frequency band. If a spectrum feature is directly used, such a problem can be avoided and thus robustness against noise could be expected to increase. Although various researches on using spectral domain features have been conducted, improvement of recognition performances has been reported only in limited noise conditions. This paper proposes a novel multi-band ASR method using a new log-spectral domain feature. In order to increase the robustness, log-spectrum features are normalized by applying the three processes: subtracting the mean log-energy for each frame, emphasizing spectral peaks, and subtracting the log-spectral mean averaged over an utterance. Spectral component likelihood values in each frame are weighted by normalized spectral level (spectral peaks) or SNR of each component. Experimental results using babble noise-added speech show that recognition performance is improved by the proposed method in comparison with the MFCC-based method. The performance is further improved by spectral-peak weighting and SNR-based frequency-band weighting techniques. |
---|---|
ISSN: | 0001-4966 1520-8524 |
DOI: | 10.1121/1.4784906 |