Application of an adaptive auditory model to speech recognition

One approach to designing signal processors for speech recognition has been to model the mammalian auditory system. Most designs have not attempted to capture the time-varying nature of the system, but have focused on the psychophysical aspects of critical bandwidth and loudness estimation. The IBM...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of the Acoustical Society of America 1985-11, Vol.78 (S1), p.S50-S50
1. Verfasser: Cohen, Jordan R.
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page S50
container_issue S1
container_start_page S50
container_title The Journal of the Acoustical Society of America
container_volume 78
creator Cohen, Jordan R.
description One approach to designing signal processors for speech recognition has been to model the mammalian auditory system. Most designs have not attempted to capture the time-varying nature of the system, but have focused on the psychophysical aspects of critical bandwidth and loudness estimation. The IBM 5000-word speech recognition system [Bahl et al., IEEE Trans. Pattern Anal. Machine Intell. PAMI-5, 179–190 (1983)] uses an auditory model in which psychophysical critical-band tuning and loudness estimation are combined with a firing-rate model patterned after that of Schroeder and Hall [J. Acoust. Soc. Am. 55, 1055–1060 (1974)]. The signal processing system consists of a critical-bandwidth filter bank, loudness estimation (intensity to the 1/3 power), and a reservoir-type firing-rate model with one internal state for each band. This model enhances transient events in the auditory signal, and causes rapid stimulus offsets to be marked by outputs smaller than the resting rate. The use of this auditory model in the IBM system produces a 4.4% error rate on a standard corpus of four speakers, while the previous filter-bank signal processor produces 7.4% errors on the same data.
doi_str_mv 10.1121/1.2022857
format Article
fullrecord <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1121_1_2022857</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1121_1_2022857</sourcerecordid><originalsourceid>FETCH-LOGICAL-c747-94484568c15390607c0179a0e27e986cd96e7206b6db6a7b8321a19648c2998a3</originalsourceid><addsrcrecordid>eNotj81Kw0AURgdRMFYXvsFsXaTeO5nflZSiVSi46T7cTCY6knbCTBT69lrs6vAtzgeHsXuEJaLAR1wKEMIqc8EqVAJqq4S8ZBUAYC2d1tfsppSvv6ls4yr2tJqmMXqaYzrwNHA6cOppmuNP4PTdxznlI9-nPox8TrxMIfhPnoNPH4d4cm7Z1UBjCXdnLtju5Xm3fq2375u39WpbeyNN7aS0UmnrUTUONBgPaBxBECY4q33vdDACdKf7TpPpbCOQ0GlpvXDOUrNgD_-3PqdSchjaKcc95WOL0J7CW2zP4c0v6MhI4Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Application of an adaptive auditory model to speech recognition</title><source>AIP Acoustical Society of America</source><creator>Cohen, Jordan R.</creator><creatorcontrib>Cohen, Jordan R.</creatorcontrib><description>One approach to designing signal processors for speech recognition has been to model the mammalian auditory system. Most designs have not attempted to capture the time-varying nature of the system, but have focused on the psychophysical aspects of critical bandwidth and loudness estimation. The IBM 5000-word speech recognition system [Bahl et al., IEEE Trans. Pattern Anal. Machine Intell. PAMI-5, 179–190 (1983)] uses an auditory model in which psychophysical critical-band tuning and loudness estimation are combined with a firing-rate model patterned after that of Schroeder and Hall [J. Acoust. Soc. Am. 55, 1055–1060 (1974)]. The signal processing system consists of a critical-bandwidth filter bank, loudness estimation (intensity to the 1/3 power), and a reservoir-type firing-rate model with one internal state for each band. This model enhances transient events in the auditory signal, and causes rapid stimulus offsets to be marked by outputs smaller than the resting rate. The use of this auditory model in the IBM system produces a 4.4% error rate on a standard corpus of four speakers, while the previous filter-bank signal processor produces 7.4% errors on the same data.</description><identifier>ISSN: 0001-4966</identifier><identifier>EISSN: 1520-8524</identifier><identifier>DOI: 10.1121/1.2022857</identifier><language>eng</language><ispartof>The Journal of the Acoustical Society of America, 1985-11, Vol.78 (S1), p.S50-S50</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c747-94484568c15390607c0179a0e27e986cd96e7206b6db6a7b8321a19648c2998a3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>207,314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Cohen, Jordan R.</creatorcontrib><title>Application of an adaptive auditory model to speech recognition</title><title>The Journal of the Acoustical Society of America</title><description>One approach to designing signal processors for speech recognition has been to model the mammalian auditory system. Most designs have not attempted to capture the time-varying nature of the system, but have focused on the psychophysical aspects of critical bandwidth and loudness estimation. The IBM 5000-word speech recognition system [Bahl et al., IEEE Trans. Pattern Anal. Machine Intell. PAMI-5, 179–190 (1983)] uses an auditory model in which psychophysical critical-band tuning and loudness estimation are combined with a firing-rate model patterned after that of Schroeder and Hall [J. Acoust. Soc. Am. 55, 1055–1060 (1974)]. The signal processing system consists of a critical-bandwidth filter bank, loudness estimation (intensity to the 1/3 power), and a reservoir-type firing-rate model with one internal state for each band. This model enhances transient events in the auditory signal, and causes rapid stimulus offsets to be marked by outputs smaller than the resting rate. The use of this auditory model in the IBM system produces a 4.4% error rate on a standard corpus of four speakers, while the previous filter-bank signal processor produces 7.4% errors on the same data.</description><issn>0001-4966</issn><issn>1520-8524</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>1985</creationdate><recordtype>article</recordtype><recordid>eNotj81Kw0AURgdRMFYXvsFsXaTeO5nflZSiVSi46T7cTCY6knbCTBT69lrs6vAtzgeHsXuEJaLAR1wKEMIqc8EqVAJqq4S8ZBUAYC2d1tfsppSvv6ls4yr2tJqmMXqaYzrwNHA6cOppmuNP4PTdxznlI9-nPox8TrxMIfhPnoNPH4d4cm7Z1UBjCXdnLtju5Xm3fq2375u39WpbeyNN7aS0UmnrUTUONBgPaBxBECY4q33vdDACdKf7TpPpbCOQ0GlpvXDOUrNgD_-3PqdSchjaKcc95WOL0J7CW2zP4c0v6MhI4Q</recordid><startdate>19851101</startdate><enddate>19851101</enddate><creator>Cohen, Jordan R.</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>19851101</creationdate><title>Application of an adaptive auditory model to speech recognition</title><author>Cohen, Jordan R.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c747-94484568c15390607c0179a0e27e986cd96e7206b6db6a7b8321a19648c2998a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>1985</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cohen, Jordan R.</creatorcontrib><collection>CrossRef</collection><jtitle>The Journal of the Acoustical Society of America</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cohen, Jordan R.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Application of an adaptive auditory model to speech recognition</atitle><jtitle>The Journal of the Acoustical Society of America</jtitle><date>1985-11-01</date><risdate>1985</risdate><volume>78</volume><issue>S1</issue><spage>S50</spage><epage>S50</epage><pages>S50-S50</pages><issn>0001-4966</issn><eissn>1520-8524</eissn><abstract>One approach to designing signal processors for speech recognition has been to model the mammalian auditory system. Most designs have not attempted to capture the time-varying nature of the system, but have focused on the psychophysical aspects of critical bandwidth and loudness estimation. The IBM 5000-word speech recognition system [Bahl et al., IEEE Trans. Pattern Anal. Machine Intell. PAMI-5, 179–190 (1983)] uses an auditory model in which psychophysical critical-band tuning and loudness estimation are combined with a firing-rate model patterned after that of Schroeder and Hall [J. Acoust. Soc. Am. 55, 1055–1060 (1974)]. The signal processing system consists of a critical-bandwidth filter bank, loudness estimation (intensity to the 1/3 power), and a reservoir-type firing-rate model with one internal state for each band. This model enhances transient events in the auditory signal, and causes rapid stimulus offsets to be marked by outputs smaller than the resting rate. The use of this auditory model in the IBM system produces a 4.4% error rate on a standard corpus of four speakers, while the previous filter-bank signal processor produces 7.4% errors on the same data.</abstract><doi>10.1121/1.2022857</doi></addata></record>
fulltext fulltext
identifier ISSN: 0001-4966
ispartof The Journal of the Acoustical Society of America, 1985-11, Vol.78 (S1), p.S50-S50
issn 0001-4966
1520-8524
language eng
recordid cdi_crossref_primary_10_1121_1_2022857
source AIP Acoustical Society of America
title Application of an adaptive auditory model to speech recognition
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T19%3A04%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Application%20of%20an%20adaptive%20auditory%20model%20to%20speech%20recognition&rft.jtitle=The%20Journal%20of%20the%20Acoustical%20Society%20of%20America&rft.au=Cohen,%20Jordan%20R.&rft.date=1985-11-01&rft.volume=78&rft.issue=S1&rft.spage=S50&rft.epage=S50&rft.pages=S50-S50&rft.issn=0001-4966&rft.eissn=1520-8524&rft_id=info:doi/10.1121/1.2022857&rft_dat=%3Ccrossref%3E10_1121_1_2022857%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true