Spectral amplitude nonlinearities for improved noise robustness of spectral features for use in automatic speech recognition

Auditory models for outer periphery processing include a sigmoid shaped nonlinearity that is even more compressed than standard logarithmic scaling at very low and very high amplitudes. In some studies done at Carnegie Mellon University, it has been shown that this compressive nonlinearity is the mo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of the Acoustical Society of America 2011-10, Vol.130 (4_Supplement), p.2524-2524
Hauptverfasser: Zahorian, Stephen, Wong, Brian
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2524
container_issue 4_Supplement
container_start_page 2524
container_title The Journal of the Acoustical Society of America
container_volume 130
creator Zahorian, Stephen
Wong, Brian
description Auditory models for outer periphery processing include a sigmoid shaped nonlinearity that is even more compressed than standard logarithmic scaling at very low and very high amplitudes. In some studies done at Carnegie Mellon University, it has been shown that this compressive nonlinearity is the most important aspect of the Seneff auditory model in terms of improving accuracy of automatic speech recognition in the presence of noise. However, in this previous work, the nonlinearity was trained for each frequency band of the Mel frequency cepstrum coefficients thus making it impractical to incorporate in automatic speech recognition systems. In the current study, a compressive nonlinearity is parametrically represented and constructed without training, to allow various degrees of steepness and “rounding” of corners for low and high amplitudes. Using this nonlinearity, experimental results for various noise conditions, and with mismatches in noise between training and test data, were obtained for phone recognition using the TIMIT and NTIMIT databases. The implications of the results are that a fixed compressive nonlinearity can be used to improve automatic speech recognition robustness with respect to mismatches between training and test data.
doi_str_mv 10.1121/1.3655077
format Article
fullrecord <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1121_1_3655077</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1121_1_3655077</sourcerecordid><originalsourceid>FETCH-crossref_primary_10_1121_1_36550773</originalsourceid><addsrcrecordid>eNqVj81qwzAQhEVpoO7PoW-gaw5OJTtyknNJ6L25C1VZpVtsyexKhUIfPg64D9DTMMw3AyPEs1YrrRv9oldtZ4zabG5EpU2j6q1p1reiUkrper3rujtxz_w1WbNtd5X4fR_BZ3K9dMPYYy4nkDHFHiM4wozAMiSSOIyUvuE0ZcggKX0UzhGYZQqS_yYCuFxorpSJwyhdyWlwGf0VA_8pCXw6x2k6xUexCK5neJr1QSwP--PrW-0pMRMEOxIOjn6sVvZ6z2o732v_w14ArIRYdA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Spectral amplitude nonlinearities for improved noise robustness of spectral features for use in automatic speech recognition</title><source>AIP Journals Complete</source><source>Alma/SFX Local Collection</source><source>AIP Acoustical Society of America</source><creator>Zahorian, Stephen ; Wong, Brian</creator><creatorcontrib>Zahorian, Stephen ; Wong, Brian</creatorcontrib><description>Auditory models for outer periphery processing include a sigmoid shaped nonlinearity that is even more compressed than standard logarithmic scaling at very low and very high amplitudes. In some studies done at Carnegie Mellon University, it has been shown that this compressive nonlinearity is the most important aspect of the Seneff auditory model in terms of improving accuracy of automatic speech recognition in the presence of noise. However, in this previous work, the nonlinearity was trained for each frequency band of the Mel frequency cepstrum coefficients thus making it impractical to incorporate in automatic speech recognition systems. In the current study, a compressive nonlinearity is parametrically represented and constructed without training, to allow various degrees of steepness and “rounding” of corners for low and high amplitudes. Using this nonlinearity, experimental results for various noise conditions, and with mismatches in noise between training and test data, were obtained for phone recognition using the TIMIT and NTIMIT databases. The implications of the results are that a fixed compressive nonlinearity can be used to improve automatic speech recognition robustness with respect to mismatches between training and test data.</description><identifier>ISSN: 0001-4966</identifier><identifier>EISSN: 1520-8524</identifier><identifier>DOI: 10.1121/1.3655077</identifier><language>eng</language><ispartof>The Journal of the Acoustical Society of America, 2011-10, Vol.130 (4_Supplement), p.2524-2524</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>207,208,315,781,785,27929,27930</link.rule.ids></links><search><creatorcontrib>Zahorian, Stephen</creatorcontrib><creatorcontrib>Wong, Brian</creatorcontrib><title>Spectral amplitude nonlinearities for improved noise robustness of spectral features for use in automatic speech recognition</title><title>The Journal of the Acoustical Society of America</title><description>Auditory models for outer periphery processing include a sigmoid shaped nonlinearity that is even more compressed than standard logarithmic scaling at very low and very high amplitudes. In some studies done at Carnegie Mellon University, it has been shown that this compressive nonlinearity is the most important aspect of the Seneff auditory model in terms of improving accuracy of automatic speech recognition in the presence of noise. However, in this previous work, the nonlinearity was trained for each frequency band of the Mel frequency cepstrum coefficients thus making it impractical to incorporate in automatic speech recognition systems. In the current study, a compressive nonlinearity is parametrically represented and constructed without training, to allow various degrees of steepness and “rounding” of corners for low and high amplitudes. Using this nonlinearity, experimental results for various noise conditions, and with mismatches in noise between training and test data, were obtained for phone recognition using the TIMIT and NTIMIT databases. The implications of the results are that a fixed compressive nonlinearity can be used to improve automatic speech recognition robustness with respect to mismatches between training and test data.</description><issn>0001-4966</issn><issn>1520-8524</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2011</creationdate><recordtype>article</recordtype><recordid>eNqVj81qwzAQhEVpoO7PoW-gaw5OJTtyknNJ6L25C1VZpVtsyexKhUIfPg64D9DTMMw3AyPEs1YrrRv9oldtZ4zabG5EpU2j6q1p1reiUkrper3rujtxz_w1WbNtd5X4fR_BZ3K9dMPYYy4nkDHFHiM4wozAMiSSOIyUvuE0ZcggKX0UzhGYZQqS_yYCuFxorpSJwyhdyWlwGf0VA_8pCXw6x2k6xUexCK5neJr1QSwP--PrW-0pMRMEOxIOjn6sVvZ6z2o732v_w14ArIRYdA</recordid><startdate>20111001</startdate><enddate>20111001</enddate><creator>Zahorian, Stephen</creator><creator>Wong, Brian</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20111001</creationdate><title>Spectral amplitude nonlinearities for improved noise robustness of spectral features for use in automatic speech recognition</title><author>Zahorian, Stephen ; Wong, Brian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-crossref_primary_10_1121_1_36550773</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2011</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zahorian, Stephen</creatorcontrib><creatorcontrib>Wong, Brian</creatorcontrib><collection>CrossRef</collection><jtitle>The Journal of the Acoustical Society of America</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zahorian, Stephen</au><au>Wong, Brian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Spectral amplitude nonlinearities for improved noise robustness of spectral features for use in automatic speech recognition</atitle><jtitle>The Journal of the Acoustical Society of America</jtitle><date>2011-10-01</date><risdate>2011</risdate><volume>130</volume><issue>4_Supplement</issue><spage>2524</spage><epage>2524</epage><pages>2524-2524</pages><issn>0001-4966</issn><eissn>1520-8524</eissn><abstract>Auditory models for outer periphery processing include a sigmoid shaped nonlinearity that is even more compressed than standard logarithmic scaling at very low and very high amplitudes. In some studies done at Carnegie Mellon University, it has been shown that this compressive nonlinearity is the most important aspect of the Seneff auditory model in terms of improving accuracy of automatic speech recognition in the presence of noise. However, in this previous work, the nonlinearity was trained for each frequency band of the Mel frequency cepstrum coefficients thus making it impractical to incorporate in automatic speech recognition systems. In the current study, a compressive nonlinearity is parametrically represented and constructed without training, to allow various degrees of steepness and “rounding” of corners for low and high amplitudes. Using this nonlinearity, experimental results for various noise conditions, and with mismatches in noise between training and test data, were obtained for phone recognition using the TIMIT and NTIMIT databases. The implications of the results are that a fixed compressive nonlinearity can be used to improve automatic speech recognition robustness with respect to mismatches between training and test data.</abstract><doi>10.1121/1.3655077</doi></addata></record>
fulltext fulltext
identifier ISSN: 0001-4966
ispartof The Journal of the Acoustical Society of America, 2011-10, Vol.130 (4_Supplement), p.2524-2524
issn 0001-4966
1520-8524
language eng
recordid cdi_crossref_primary_10_1121_1_3655077
source AIP Journals Complete; Alma/SFX Local Collection; AIP Acoustical Society of America
title Spectral amplitude nonlinearities for improved noise robustness of spectral features for use in automatic speech recognition
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-13T09%3A34%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Spectral%20amplitude%20nonlinearities%20for%20improved%20noise%20robustness%20of%20spectral%20features%20for%20use%20in%20automatic%20speech%20recognition&rft.jtitle=The%20Journal%20of%20the%20Acoustical%20Society%20of%20America&rft.au=Zahorian,%20Stephen&rft.date=2011-10-01&rft.volume=130&rft.issue=4_Supplement&rft.spage=2524&rft.epage=2524&rft.pages=2524-2524&rft.issn=0001-4966&rft.eissn=1520-8524&rft_id=info:doi/10.1121/1.3655077&rft_dat=%3Ccrossref%3E10_1121_1_3655077%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true