Application of an adaptive auditory model to speech recognition

One approach to designing signal processors for speech recognition has been to model the mammalian auditory system. Most designs have not attempted to capture the time-varying nature of the system, but have focused on the psychophysical aspects of critical bandwidth and loudness estimation. The IBM...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of the Acoustical Society of America 1985-11, Vol.78 (S1), p.S50-S50
1. Verfasser: Cohen, Jordan R.
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:One approach to designing signal processors for speech recognition has been to model the mammalian auditory system. Most designs have not attempted to capture the time-varying nature of the system, but have focused on the psychophysical aspects of critical bandwidth and loudness estimation. The IBM 5000-word speech recognition system [Bahl et al., IEEE Trans. Pattern Anal. Machine Intell. PAMI-5, 179–190 (1983)] uses an auditory model in which psychophysical critical-band tuning and loudness estimation are combined with a firing-rate model patterned after that of Schroeder and Hall [J. Acoust. Soc. Am. 55, 1055–1060 (1974)]. The signal processing system consists of a critical-bandwidth filter bank, loudness estimation (intensity to the 1/3 power), and a reservoir-type firing-rate model with one internal state for each band. This model enhances transient events in the auditory signal, and causes rapid stimulus offsets to be marked by outputs smaller than the resting rate. The use of this auditory model in the IBM system produces a 4.4% error rate on a standard corpus of four speakers, while the previous filter-bank signal processor produces 7.4% errors on the same data.
ISSN:0001-4966
1520-8524
DOI:10.1121/1.2022857