A Subband-Based Stationary-Component Suppression Method Using Harmonics and Power Ratio for Reverberant Speech Recognition

This letter describes a preprocessing method called subband-based stationary-component suppression method using harmonics and power ratio (SHARP) processing for reverberant speech recognition. SHARP processing extends a previous algorithm called Suppression of Slowly varying components and the Falli...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE signal processing letters 2016-06, Vol.23 (6), p.780-784
Hauptverfasser:	Byung Joon Cho, Haeyong Kwon, Ji-Won Cho, Chanwoo Kim, Stern, Richard M., Hyung-Min Park
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Harmonic analysis harmonics Indexes Power system harmonics precedence effect reverberation Robust speech recognition Signal processing algorithms Speech Speech processing Speech recognition Voice recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This letter describes a preprocessing method called subband-based stationary-component suppression method using harmonics and power ratio (SHARP) processing for reverberant speech recognition. SHARP processing extends a previous algorithm called Suppression of Slowly varying components and the Falling edge (SSF), which suppresses the steady-state portions of subband spectral envelopes. The SSF algorithm tends to over-subtract these envelopes in highly reverberant environments when there are high levels of power in previous analysis frames. The proposed SHARP method prevents excessive suppression both by boosting the floor value using the harmonics in voiced speech segments and by inhibiting the subtraction for unvoiced speech by detecting frames in which power is concentrated in high-frequency channels. These modifications enable the SHARP algorithm to improve recognition accuracy by further reducing the mismatch between power contours of clean and reverberated speech. Experimental results indicate that the SHARP method provides better recognition accuracy in highly reverberant environments compared to the SSF algorithm. It is also shown that the performance of the SHARP method can be further improved by combining it with feature-space maximum likelihood linear regression (fMLLR).
ISSN:	1070-9908 1558-2361
DOI:	10.1109/LSP.2016.2554888