Spectral amplitude nonlinearities for improved noise robustness of spectral features for use in automatic speech recognition
Auditory models for outer periphery processing include a sigmoid shaped nonlinearity that is even more compressed than standard logarithmic scaling at very low and very high amplitudes. In some studies done at Carnegie Mellon University, it has been shown that this compressive nonlinearity is the mo...
Gespeichert in:
Veröffentlicht in: | The Journal of the Acoustical Society of America 2011-10, Vol.130 (4_Supplement), p.2524-2524 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 2524 |
---|---|
container_issue | 4_Supplement |
container_start_page | 2524 |
container_title | The Journal of the Acoustical Society of America |
container_volume | 130 |
creator | Zahorian, Stephen Wong, Brian |
description | Auditory models for outer periphery processing include a sigmoid shaped nonlinearity that is even more compressed than standard logarithmic scaling at very low and very high amplitudes. In some studies done at Carnegie Mellon University, it has been shown that this compressive nonlinearity is the most important aspect of the Seneff auditory model in terms of improving accuracy of automatic speech recognition in the presence of noise. However, in this previous work, the nonlinearity was trained for each frequency band of the Mel frequency cepstrum coefficients thus making it impractical to incorporate in automatic speech recognition systems. In the current study, a compressive nonlinearity is parametrically represented and constructed without training, to allow various degrees of steepness and “rounding” of corners for low and high amplitudes. Using this nonlinearity, experimental results for various noise conditions, and with mismatches in noise between training and test data, were obtained for phone recognition using the TIMIT and NTIMIT databases. The implications of the results are that a fixed compressive nonlinearity can be used to improve automatic speech recognition robustness with respect to mismatches between training and test data. |
doi_str_mv | 10.1121/1.3655077 |
format | Article |
fullrecord | <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1121_1_3655077</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1121_1_3655077</sourcerecordid><originalsourceid>FETCH-crossref_primary_10_1121_1_36550773</originalsourceid><addsrcrecordid>eNqVj81qwzAQhEVpoO7PoW-gaw5OJTtyknNJ6L25C1VZpVtsyexKhUIfPg64D9DTMMw3AyPEs1YrrRv9oldtZ4zabG5EpU2j6q1p1reiUkrper3rujtxz_w1WbNtd5X4fR_BZ3K9dMPYYy4nkDHFHiM4wozAMiSSOIyUvuE0ZcggKX0UzhGYZQqS_yYCuFxorpSJwyhdyWlwGf0VA_8pCXw6x2k6xUexCK5neJr1QSwP--PrW-0pMRMEOxIOjn6sVvZ6z2o732v_w14ArIRYdA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Spectral amplitude nonlinearities for improved noise robustness of spectral features for use in automatic speech recognition</title><source>AIP Journals Complete</source><source>Alma/SFX Local Collection</source><source>AIP Acoustical Society of America</source><creator>Zahorian, Stephen ; Wong, Brian</creator><creatorcontrib>Zahorian, Stephen ; Wong, Brian</creatorcontrib><description>Auditory models for outer periphery processing include a sigmoid shaped nonlinearity that is even more compressed than standard logarithmic scaling at very low and very high amplitudes. In some studies done at Carnegie Mellon University, it has been shown that this compressive nonlinearity is the most important aspect of the Seneff auditory model in terms of improving accuracy of automatic speech recognition in the presence of noise. However, in this previous work, the nonlinearity was trained for each frequency band of the Mel frequency cepstrum coefficients thus making it impractical to incorporate in automatic speech recognition systems. In the current study, a compressive nonlinearity is parametrically represented and constructed without training, to allow various degrees of steepness and “rounding” of corners for low and high amplitudes. Using this nonlinearity, experimental results for various noise conditions, and with mismatches in noise between training and test data, were obtained for phone recognition using the TIMIT and NTIMIT databases. The implications of the results are that a fixed compressive nonlinearity can be used to improve automatic speech recognition robustness with respect to mismatches between training and test data.</description><identifier>ISSN: 0001-4966</identifier><identifier>EISSN: 1520-8524</identifier><identifier>DOI: 10.1121/1.3655077</identifier><language>eng</language><ispartof>The Journal of the Acoustical Society of America, 2011-10, Vol.130 (4_Supplement), p.2524-2524</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>207,208,315,781,785,27929,27930</link.rule.ids></links><search><creatorcontrib>Zahorian, Stephen</creatorcontrib><creatorcontrib>Wong, Brian</creatorcontrib><title>Spectral amplitude nonlinearities for improved noise robustness of spectral features for use in automatic speech recognition</title><title>The Journal of the Acoustical Society of America</title><description>Auditory models for outer periphery processing include a sigmoid shaped nonlinearity that is even more compressed than standard logarithmic scaling at very low and very high amplitudes. In some studies done at Carnegie Mellon University, it has been shown that this compressive nonlinearity is the most important aspect of the Seneff auditory model in terms of improving accuracy of automatic speech recognition in the presence of noise. However, in this previous work, the nonlinearity was trained for each frequency band of the Mel frequency cepstrum coefficients thus making it impractical to incorporate in automatic speech recognition systems. In the current study, a compressive nonlinearity is parametrically represented and constructed without training, to allow various degrees of steepness and “rounding” of corners for low and high amplitudes. Using this nonlinearity, experimental results for various noise conditions, and with mismatches in noise between training and test data, were obtained for phone recognition using the TIMIT and NTIMIT databases. The implications of the results are that a fixed compressive nonlinearity can be used to improve automatic speech recognition robustness with respect to mismatches between training and test data.</description><issn>0001-4966</issn><issn>1520-8524</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2011</creationdate><recordtype>article</recordtype><recordid>eNqVj81qwzAQhEVpoO7PoW-gaw5OJTtyknNJ6L25C1VZpVtsyexKhUIfPg64D9DTMMw3AyPEs1YrrRv9oldtZ4zabG5EpU2j6q1p1reiUkrper3rujtxz_w1WbNtd5X4fR_BZ3K9dMPYYy4nkDHFHiM4wozAMiSSOIyUvuE0ZcggKX0UzhGYZQqS_yYCuFxorpSJwyhdyWlwGf0VA_8pCXw6x2k6xUexCK5neJr1QSwP--PrW-0pMRMEOxIOjn6sVvZ6z2o732v_w14ArIRYdA</recordid><startdate>20111001</startdate><enddate>20111001</enddate><creator>Zahorian, Stephen</creator><creator>Wong, Brian</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20111001</creationdate><title>Spectral amplitude nonlinearities for improved noise robustness of spectral features for use in automatic speech recognition</title><author>Zahorian, Stephen ; Wong, Brian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-crossref_primary_10_1121_1_36550773</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2011</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zahorian, Stephen</creatorcontrib><creatorcontrib>Wong, Brian</creatorcontrib><collection>CrossRef</collection><jtitle>The Journal of the Acoustical Society of America</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zahorian, Stephen</au><au>Wong, Brian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Spectral amplitude nonlinearities for improved noise robustness of spectral features for use in automatic speech recognition</atitle><jtitle>The Journal of the Acoustical Society of America</jtitle><date>2011-10-01</date><risdate>2011</risdate><volume>130</volume><issue>4_Supplement</issue><spage>2524</spage><epage>2524</epage><pages>2524-2524</pages><issn>0001-4966</issn><eissn>1520-8524</eissn><abstract>Auditory models for outer periphery processing include a sigmoid shaped nonlinearity that is even more compressed than standard logarithmic scaling at very low and very high amplitudes. In some studies done at Carnegie Mellon University, it has been shown that this compressive nonlinearity is the most important aspect of the Seneff auditory model in terms of improving accuracy of automatic speech recognition in the presence of noise. However, in this previous work, the nonlinearity was trained for each frequency band of the Mel frequency cepstrum coefficients thus making it impractical to incorporate in automatic speech recognition systems. In the current study, a compressive nonlinearity is parametrically represented and constructed without training, to allow various degrees of steepness and “rounding” of corners for low and high amplitudes. Using this nonlinearity, experimental results for various noise conditions, and with mismatches in noise between training and test data, were obtained for phone recognition using the TIMIT and NTIMIT databases. The implications of the results are that a fixed compressive nonlinearity can be used to improve automatic speech recognition robustness with respect to mismatches between training and test data.</abstract><doi>10.1121/1.3655077</doi></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0001-4966 |
ispartof | The Journal of the Acoustical Society of America, 2011-10, Vol.130 (4_Supplement), p.2524-2524 |
issn | 0001-4966 1520-8524 |
language | eng |
recordid | cdi_crossref_primary_10_1121_1_3655077 |
source | AIP Journals Complete; Alma/SFX Local Collection; AIP Acoustical Society of America |
title | Spectral amplitude nonlinearities for improved noise robustness of spectral features for use in automatic speech recognition |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-13T09%3A34%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Spectral%20amplitude%20nonlinearities%20for%20improved%20noise%20robustness%20of%20spectral%20features%20for%20use%20in%20automatic%20speech%20recognition&rft.jtitle=The%20Journal%20of%20the%20Acoustical%20Society%20of%20America&rft.au=Zahorian,%20Stephen&rft.date=2011-10-01&rft.volume=130&rft.issue=4_Supplement&rft.spage=2524&rft.epage=2524&rft.pages=2524-2524&rft.issn=0001-4966&rft.eissn=1520-8524&rft_id=info:doi/10.1121/1.3655077&rft_dat=%3Ccrossref%3E10_1121_1_3655077%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |