Application of an adaptive auditory model to speech recognition

One approach to designing signal processors for speech recognition has been to model the mammalian auditory system. Most designs have not attempted to capture the time-varying nature of the system, but have focused on the psychophysical aspects of critical bandwidth and loudness estimation. The IBM...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of the Acoustical Society of America 1985-11, Vol.78 (S1), p.S50-S50
1. Verfasser:	Cohen, Jordan R.
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	S50
container_issue	S1
container_start_page	S50
container_title	The Journal of the Acoustical Society of America
container_volume	78
creator	Cohen, Jordan R.
description	One approach to designing signal processors for speech recognition has been to model the mammalian auditory system. Most designs have not attempted to capture the time-varying nature of the system, but have focused on the psychophysical aspects of critical bandwidth and loudness estimation. The IBM 5000-word speech recognition system [Bahl et al., IEEE Trans. Pattern Anal. Machine Intell. PAMI-5, 179–190 (1983)] uses an auditory model in which psychophysical critical-band tuning and loudness estimation are combined with a firing-rate model patterned after that of Schroeder and Hall [J. Acoust. Soc. Am. 55, 1055–1060 (1974)]. The signal processing system consists of a critical-bandwidth filter bank, loudness estimation (intensity to the 1/3 power), and a reservoir-type firing-rate model with one internal state for each band. This model enhances transient events in the auditory signal, and causes rapid stimulus offsets to be marked by outputs smaller than the resting rate. The use of this auditory model in the IBM system produces a 4.4% error rate on a standard corpus of four speakers, while the previous filter-bank signal processor produces 7.4% errors on the same data.
doi_str_mv	10.1121/1.2022857
format	Article
fullrecord	<record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1121_1_2022857</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1121_1_2022857</sourcerecordid><originalsourceid>FETCH-LOGICAL-c747-94484568c15390607c0179a0e27e986cd96e7206b6db6a7b8321a19648c2998a3</originalsourceid><addsrcrecordid>eNotj81Kw0AURgdRMFYXvsFsXaTeO5nflZSiVSi46T7cTCY6knbCTBT69lrs6vAtzgeHsXuEJaLAR1wKEMIqc8EqVAJqq4S8ZBUAYC2d1tfsppSvv6ls4yr2tJqmMXqaYzrwNHA6cOppmuNP4PTdxznlI9-nPox8TrxMIfhPnoNPH4d4cm7Z1UBjCXdnLtju5Xm3fq2375u39WpbeyNN7aS0UmnrUTUONBgPaBxBECY4q33vdDACdKf7TpPpbCOQ0GlpvXDOUrNgD_-3PqdSchjaKcc95WOL0J7CW2zP4c0v6MhI4Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Application of an adaptive auditory model to speech recognition</title><source>AIP Acoustical Society of America</source><creator>Cohen, Jordan R.</creator><creatorcontrib>Cohen, Jordan R.</creatorcontrib><description>One approach to designing signal processors for speech recognition has been to model the mammalian auditory system. Most designs have not attempted to capture the time-varying nature of the system, but have focused on the psychophysical aspects of critical bandwidth and loudness estimation. The IBM 5000-word speech recognition system [Bahl et al., IEEE Trans. Pattern Anal. Machine Intell. PAMI-5, 179–190 (1983)] uses an auditory model in which psychophysical critical-band tuning and loudness estimation are combined with a firing-rate model patterned after that of Schroeder and Hall [J. Acoust. Soc. Am. 55, 1055–1060 (1974)]. The signal processing system consists of a critical-bandwidth filter bank, loudness estimation (intensity to the 1/3 power), and a reservoir-type firing-rate model with one internal state for each band. This model enhances transient events in the auditory signal, and causes rapid stimulus offsets to be marked by outputs smaller than the resting rate. The use of this auditory model in the IBM system produces a 4.4% error rate on a standard corpus of four speakers, while the previous filter-bank signal processor produces 7.4% errors on the same data.</description><identifier>ISSN: 0001-4966</identifier><identifier>EISSN: 1520-8524</identifier><identifier>DOI: 10.1121/1.2022857</identifier><language>eng</language><ispartof>The Journal of the Acoustical Society of America, 1985-11, Vol.78 (S1), p.S50-S50</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c747-94484568c15390607c0179a0e27e986cd96e7206b6db6a7b8321a19648c2998a3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>207,314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Cohen, Jordan R.</creatorcontrib><title>Application of an adaptive auditory model to speech recognition</title><title>The Journal of the Acoustical Society of America</title><description>One approach to designing signal processors for speech recognition has been to model the mammalian auditory system. Most designs have not attempted to capture the time-varying nature of the system, but have focused on the psychophysical aspects of critical bandwidth and loudness estimation. The IBM 5000-word speech recognition system [Bahl et al., IEEE Trans. Pattern Anal. Machine Intell. PAMI-5, 179–190 (1983)] uses an auditory model in which psychophysical critical-band tuning and loudness estimation are combined with a firing-rate model patterned after that of Schroeder and Hall [J. Acoust. Soc. Am. 55, 1055–1060 (1974)]. The signal processing system consists of a critical-bandwidth filter bank, loudness estimation (intensity to the 1/3 power), and a reservoir-type firing-rate model with one internal state for each band. This model enhances transient events in the auditory signal, and causes rapid stimulus offsets to be marked by outputs smaller than the resting rate. The use of this auditory model in the IBM system produces a 4.4% error rate on a standard corpus of four speakers, while the previous filter-bank signal processor produces 7.4% errors on the same data.</description><issn>0001-4966</issn><issn>1520-8524</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>1985</creationdate><recordtype>article</recordtype><recordid>eNotj81Kw0AURgdRMFYXvsFsXaTeO5nflZSiVSi46T7cTCY6knbCTBT69lrs6vAtzgeHsXuEJaLAR1wKEMIqc8EqVAJqq4S8ZBUAYC2d1tfsppSvv6ls4yr2tJqmMXqaYzrwNHA6cOppmuNP4PTdxznlI9-nPox8TrxMIfhPnoNPH4d4cm7Z1UBjCXdnLtju5Xm3fq2375u39WpbeyNN7aS0UmnrUTUONBgPaBxBECY4q33vdDACdKf7TpPpbCOQ0GlpvXDOUrNgD_-3PqdSchjaKcc95WOL0J7CW2zP4c0v6MhI4Q</recordid><startdate>19851101</startdate><enddate>19851101</enddate><creator>Cohen, Jordan R.</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>19851101</creationdate><title>Application of an adaptive auditory model to speech recognition</title><author>Cohen, Jordan R.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c747-94484568c15390607c0179a0e27e986cd96e7206b6db6a7b8321a19648c2998a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>1985</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cohen, Jordan R.</creatorcontrib><collection>CrossRef</collection><jtitle>The Journal of the Acoustical Society of America</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cohen, Jordan R.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Application of an adaptive auditory model to speech recognition</atitle><jtitle>The Journal of the Acoustical Society of America</jtitle><date>1985-11-01</date><risdate>1985</risdate><volume>78</volume><issue>S1</issue><spage>S50</spage><epage>S50</epage><pages>S50-S50</pages><issn>0001-4966</issn><eissn>1520-8524</eissn><abstract>One approach to designing signal processors for speech recognition has been to model the mammalian auditory system. Most designs have not attempted to capture the time-varying nature of the system, but have focused on the psychophysical aspects of critical bandwidth and loudness estimation. The IBM 5000-word speech recognition system [Bahl et al., IEEE Trans. Pattern Anal. Machine Intell. PAMI-5, 179–190 (1983)] uses an auditory model in which psychophysical critical-band tuning and loudness estimation are combined with a firing-rate model patterned after that of Schroeder and Hall [J. Acoust. Soc. Am. 55, 1055–1060 (1974)]. The signal processing system consists of a critical-bandwidth filter bank, loudness estimation (intensity to the 1/3 power), and a reservoir-type firing-rate model with one internal state for each band. This model enhances transient events in the auditory signal, and causes rapid stimulus offsets to be marked by outputs smaller than the resting rate. The use of this auditory model in the IBM system produces a 4.4% error rate on a standard corpus of four speakers, while the previous filter-bank signal processor produces 7.4% errors on the same data.</abstract><doi>10.1121/1.2022857</doi></addata></record>
fulltext	fulltext
identifier	ISSN: 0001-4966
ispartof	The Journal of the Acoustical Society of America, 1985-11, Vol.78 (S1), p.S50-S50
issn	0001-4966 1520-8524
language	eng
recordid	cdi_crossref_primary_10_1121_1_2022857
source	AIP Acoustical Society of America
title	Application of an adaptive auditory model to speech recognition
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T19%3A04%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Application%20of%20an%20adaptive%20auditory%20model%20to%20speech%20recognition&rft.jtitle=The%20Journal%20of%20the%20Acoustical%20Society%20of%20America&rft.au=Cohen,%20Jordan%20R.&rft.date=1985-11-01&rft.volume=78&rft.issue=S1&rft.spage=S50&rft.epage=S50&rft.pages=S50-S50&rft.issn=0001-4966&rft.eissn=1520-8524&rft_id=info:doi/10.1121/1.2022857&rft_dat=%3Ccrossref%3E10_1121_1_2022857%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true