Performance evaluation of psycho-acoustically motivated front-end compensator for TIMIT phone recognition

Wavelet-based front-end processing technique has gained popularity for its noise removing capability. In this paper, a robust automatic speech recognition system is proposed by utilizing the advantages of psycho-acoustically motivated wavelet-based front-end compensator. In the front-end compensator...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Pattern analysis and applications : PAA 2020-05, Vol.23 (2), p.527-539
Hauptverfasser:	Bhowmick, Anirban, Biswas, Astik, Chandra, Mahesh
Format:	Artikel
Sprache:	eng
Schlagworte:	Acoustic noise Automatic speech recognition Cochlea Computer Science Decomposition Frames (data processing) Frequency response Noise Noise reduction Pattern Recognition Performance evaluation Phonemes Robustness Speech recognition Theoretical Advances Voice activity detectors Voice recognition Wavelet analysis Wiener filtering
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Wavelet-based front-end processing technique has gained popularity for its noise removing capability. In this paper, a robust automatic speech recognition system is proposed by utilizing the advantages of psycho-acoustically motivated wavelet-based front-end compensator. In the front-end compensator block, voiced speech probability-based voice activity detector system is designed to separate voiced and unvoiced frames and to update noise statistics. The wavelet packet decomposition tree is designed according to equal rectangular bandwidth (ERB) scale. Wavelet decomposition based on ERB scale is utilized here as the central frequency of the ERB distribution resembles frequency response of human cochlea. Voiced and unvoiced frames are separately decomposed into 24 sub-bands to estimate average sub-band energy (ASE) of each frame. ASE is then used to calculate threshold value. Lastly, Wiener filtering is employed for reducing the residual noise before final reconstruction stage. The proposed system is evaluated on TIMIT database under various noise conditions. The phoneme recognition accuracy of the proposed system is compared with different baseline and robust features as well as with existing front-end compensation techniques. Additionally, the proposed front-end compensator is evaluated in terms of phoneme classification accuracy. Performance improvement is observed in all above experiments.
ISSN:	1433-7541 1433-755X
DOI:	10.1007/s10044-019-00816-0