Application of an adaptive auditory model to speech recognition
One approach to designing signal processors for speech recognition has been to model the mammalian auditory system. Most designs have not attempted to capture the time-varying nature of the system, but have focused on the psychophysical aspects of critical bandwidth and loudness estimation. The IBM...
Gespeichert in:
Veröffentlicht in: | The Journal of the Acoustical Society of America 1985-11, Vol.78 (S1), p.S50-S50 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | S50 |
---|---|
container_issue | S1 |
container_start_page | S50 |
container_title | The Journal of the Acoustical Society of America |
container_volume | 78 |
creator | Cohen, Jordan R. |
description | One approach to designing signal processors for speech recognition has been to model the mammalian auditory system. Most designs have not attempted to capture the time-varying nature of the system, but have focused on the psychophysical aspects of critical bandwidth and loudness estimation. The IBM 5000-word speech recognition system [Bahl et al., IEEE Trans. Pattern Anal. Machine Intell. PAMI-5, 179–190 (1983)] uses an auditory model in which psychophysical critical-band tuning and loudness estimation are combined with a firing-rate model patterned after that of Schroeder and Hall [J. Acoust. Soc. Am. 55, 1055–1060 (1974)]. The signal processing system consists of a critical-bandwidth filter bank, loudness estimation (intensity to the 1/3 power), and a reservoir-type firing-rate model with one internal state for each band. This model enhances transient events in the auditory signal, and causes rapid stimulus offsets to be marked by outputs smaller than the resting rate. The use of this auditory model in the IBM system produces a 4.4% error rate on a standard corpus of four speakers, while the previous filter-bank signal processor produces 7.4% errors on the same data. |
doi_str_mv | 10.1121/1.2022857 |
format | Article |
fullrecord | <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1121_1_2022857</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1121_1_2022857</sourcerecordid><originalsourceid>FETCH-LOGICAL-c747-94484568c15390607c0179a0e27e986cd96e7206b6db6a7b8321a19648c2998a3</originalsourceid><addsrcrecordid>eNotj81Kw0AURgdRMFYXvsFsXaTeO5nflZSiVSi46T7cTCY6knbCTBT69lrs6vAtzgeHsXuEJaLAR1wKEMIqc8EqVAJqq4S8ZBUAYC2d1tfsppSvv6ls4yr2tJqmMXqaYzrwNHA6cOppmuNP4PTdxznlI9-nPox8TrxMIfhPnoNPH4d4cm7Z1UBjCXdnLtju5Xm3fq2375u39WpbeyNN7aS0UmnrUTUONBgPaBxBECY4q33vdDACdKf7TpPpbCOQ0GlpvXDOUrNgD_-3PqdSchjaKcc95WOL0J7CW2zP4c0v6MhI4Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Application of an adaptive auditory model to speech recognition</title><source>AIP Acoustical Society of America</source><creator>Cohen, Jordan R.</creator><creatorcontrib>Cohen, Jordan R.</creatorcontrib><description>One approach to designing signal processors for speech recognition has been to model the mammalian auditory system. Most designs have not attempted to capture the time-varying nature of the system, but have focused on the psychophysical aspects of critical bandwidth and loudness estimation. The IBM 5000-word speech recognition system [Bahl et al., IEEE Trans. Pattern Anal. Machine Intell. PAMI-5, 179–190 (1983)] uses an auditory model in which psychophysical critical-band tuning and loudness estimation are combined with a firing-rate model patterned after that of Schroeder and Hall [J. Acoust. Soc. Am. 55, 1055–1060 (1974)]. The signal processing system consists of a critical-bandwidth filter bank, loudness estimation (intensity to the 1/3 power), and a reservoir-type firing-rate model with one internal state for each band. This model enhances transient events in the auditory signal, and causes rapid stimulus offsets to be marked by outputs smaller than the resting rate. The use of this auditory model in the IBM system produces a 4.4% error rate on a standard corpus of four speakers, while the previous filter-bank signal processor produces 7.4% errors on the same data.</description><identifier>ISSN: 0001-4966</identifier><identifier>EISSN: 1520-8524</identifier><identifier>DOI: 10.1121/1.2022857</identifier><language>eng</language><ispartof>The Journal of the Acoustical Society of America, 1985-11, Vol.78 (S1), p.S50-S50</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c747-94484568c15390607c0179a0e27e986cd96e7206b6db6a7b8321a19648c2998a3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>207,314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Cohen, Jordan R.</creatorcontrib><title>Application of an adaptive auditory model to speech recognition</title><title>The Journal of the Acoustical Society of America</title><description>One approach to designing signal processors for speech recognition has been to model the mammalian auditory system. Most designs have not attempted to capture the time-varying nature of the system, but have focused on the psychophysical aspects of critical bandwidth and loudness estimation. The IBM 5000-word speech recognition system [Bahl et al., IEEE Trans. Pattern Anal. Machine Intell. PAMI-5, 179–190 (1983)] uses an auditory model in which psychophysical critical-band tuning and loudness estimation are combined with a firing-rate model patterned after that of Schroeder and Hall [J. Acoust. Soc. Am. 55, 1055–1060 (1974)]. The signal processing system consists of a critical-bandwidth filter bank, loudness estimation (intensity to the 1/3 power), and a reservoir-type firing-rate model with one internal state for each band. This model enhances transient events in the auditory signal, and causes rapid stimulus offsets to be marked by outputs smaller than the resting rate. The use of this auditory model in the IBM system produces a 4.4% error rate on a standard corpus of four speakers, while the previous filter-bank signal processor produces 7.4% errors on the same data.</description><issn>0001-4966</issn><issn>1520-8524</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>1985</creationdate><recordtype>article</recordtype><recordid>eNotj81Kw0AURgdRMFYXvsFsXaTeO5nflZSiVSi46T7cTCY6knbCTBT69lrs6vAtzgeHsXuEJaLAR1wKEMIqc8EqVAJqq4S8ZBUAYC2d1tfsppSvv6ls4yr2tJqmMXqaYzrwNHA6cOppmuNP4PTdxznlI9-nPox8TrxMIfhPnoNPH4d4cm7Z1UBjCXdnLtju5Xm3fq2375u39WpbeyNN7aS0UmnrUTUONBgPaBxBECY4q33vdDACdKf7TpPpbCOQ0GlpvXDOUrNgD_-3PqdSchjaKcc95WOL0J7CW2zP4c0v6MhI4Q</recordid><startdate>19851101</startdate><enddate>19851101</enddate><creator>Cohen, Jordan R.</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>19851101</creationdate><title>Application of an adaptive auditory model to speech recognition</title><author>Cohen, Jordan R.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c747-94484568c15390607c0179a0e27e986cd96e7206b6db6a7b8321a19648c2998a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>1985</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cohen, Jordan R.</creatorcontrib><collection>CrossRef</collection><jtitle>The Journal of the Acoustical Society of America</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cohen, Jordan R.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Application of an adaptive auditory model to speech recognition</atitle><jtitle>The Journal of the Acoustical Society of America</jtitle><date>1985-11-01</date><risdate>1985</risdate><volume>78</volume><issue>S1</issue><spage>S50</spage><epage>S50</epage><pages>S50-S50</pages><issn>0001-4966</issn><eissn>1520-8524</eissn><abstract>One approach to designing signal processors for speech recognition has been to model the mammalian auditory system. Most designs have not attempted to capture the time-varying nature of the system, but have focused on the psychophysical aspects of critical bandwidth and loudness estimation. The IBM 5000-word speech recognition system [Bahl et al., IEEE Trans. Pattern Anal. Machine Intell. PAMI-5, 179–190 (1983)] uses an auditory model in which psychophysical critical-band tuning and loudness estimation are combined with a firing-rate model patterned after that of Schroeder and Hall [J. Acoust. Soc. Am. 55, 1055–1060 (1974)]. The signal processing system consists of a critical-bandwidth filter bank, loudness estimation (intensity to the 1/3 power), and a reservoir-type firing-rate model with one internal state for each band. This model enhances transient events in the auditory signal, and causes rapid stimulus offsets to be marked by outputs smaller than the resting rate. The use of this auditory model in the IBM system produces a 4.4% error rate on a standard corpus of four speakers, while the previous filter-bank signal processor produces 7.4% errors on the same data.</abstract><doi>10.1121/1.2022857</doi></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0001-4966 |
ispartof | The Journal of the Acoustical Society of America, 1985-11, Vol.78 (S1), p.S50-S50 |
issn | 0001-4966 1520-8524 |
language | eng |
recordid | cdi_crossref_primary_10_1121_1_2022857 |
source | AIP Acoustical Society of America |
title | Application of an adaptive auditory model to speech recognition |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T19%3A04%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Application%20of%20an%20adaptive%20auditory%20model%20to%20speech%20recognition&rft.jtitle=The%20Journal%20of%20the%20Acoustical%20Society%20of%20America&rft.au=Cohen,%20Jordan%20R.&rft.date=1985-11-01&rft.volume=78&rft.issue=S1&rft.spage=S50&rft.epage=S50&rft.pages=S50-S50&rft.issn=0001-4966&rft.eissn=1520-8524&rft_id=info:doi/10.1121/1.2022857&rft_dat=%3Ccrossref%3E10_1121_1_2022857%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |