Spectral amplitude nonlinearities for improved noise robustness of spectral features for use in automatic speech recognition

Auditory models for outer periphery processing include a sigmoid shaped nonlinearity that is even more compressed than standard logarithmic scaling at very low and very high amplitudes. In some studies done at Carnegie Mellon University, it has been shown that this compressive nonlinearity is the mo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of the Acoustical Society of America 2011-10, Vol.130 (4_Supplement), p.2524-2524
Hauptverfasser:	Zahorian, Stephen, Wong, Brian
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	2524
container_issue	4_Supplement
container_start_page	2524
container_title	The Journal of the Acoustical Society of America
container_volume	130
creator	Zahorian, Stephen Wong, Brian
description	Auditory models for outer periphery processing include a sigmoid shaped nonlinearity that is even more compressed than standard logarithmic scaling at very low and very high amplitudes. In some studies done at Carnegie Mellon University, it has been shown that this compressive nonlinearity is the most important aspect of the Seneff auditory model in terms of improving accuracy of automatic speech recognition in the presence of noise. However, in this previous work, the nonlinearity was trained for each frequency band of the Mel frequency cepstrum coefficients thus making it impractical to incorporate in automatic speech recognition systems. In the current study, a compressive nonlinearity is parametrically represented and constructed without training, to allow various degrees of steepness and “rounding” of corners for low and high amplitudes. Using this nonlinearity, experimental results for various noise conditions, and with mismatches in noise between training and test data, were obtained for phone recognition using the TIMIT and NTIMIT databases. The implications of the results are that a fixed compressive nonlinearity can be used to improve automatic speech recognition robustness with respect to mismatches between training and test data.
doi_str_mv	10.1121/1.3655077
format	Article
fullrecord	<record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1121_1_3655077</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1121_1_3655077</sourcerecordid><originalsourceid>FETCH-crossref_primary_10_1121_1_36550773</originalsourceid><addsrcrecordid>eNqVj81qwzAQhEVpoO7PoW-gaw5OJTtyknNJ6L25C1VZpVtsyexKhUIfPg64D9DTMMw3AyPEs1YrrRv9oldtZ4zabG5EpU2j6q1p1reiUkrper3rujtxz_w1WbNtd5X4fR_BZ3K9dMPYYy4nkDHFHiM4wozAMiSSOIyUvuE0ZcggKX0UzhGYZQqS_yYCuFxorpSJwyhdyWlwGf0VA_8pCXw6x2k6xUexCK5neJr1QSwP--PrW-0pMRMEOxIOjn6sVvZ6z2o732v_w14ArIRYdA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Spectral amplitude nonlinearities for improved noise robustness of spectral features for use in automatic speech recognition</title><source>AIP Journals Complete</source><source>Alma/SFX Local Collection</source><source>AIP Acoustical Society of America</source><creator>Zahorian, Stephen ; Wong, Brian</creator><creatorcontrib>Zahorian, Stephen ; Wong, Brian</creatorcontrib><description>Auditory models for outer periphery processing include a sigmoid shaped nonlinearity that is even more compressed than standard logarithmic scaling at very low and very high amplitudes. In some studies done at Carnegie Mellon University, it has been shown that this compressive nonlinearity is the most important aspect of the Seneff auditory model in terms of improving accuracy of automatic speech recognition in the presence of noise. However, in this previous work, the nonlinearity was trained for each frequency band of the Mel frequency cepstrum coefficients thus making it impractical to incorporate in automatic speech recognition systems. In the current study, a compressive nonlinearity is parametrically represented and constructed without training, to allow various degrees of steepness and “rounding” of corners for low and high amplitudes. Using this nonlinearity, experimental results for various noise conditions, and with mismatches in noise between training and test data, were obtained for phone recognition using the TIMIT and NTIMIT databases. The implications of the results are that a fixed compressive nonlinearity can be used to improve automatic speech recognition robustness with respect to mismatches between training and test data.</description><identifier>ISSN: 0001-4966</identifier><identifier>EISSN: 1520-8524</identifier><identifier>DOI: 10.1121/1.3655077</identifier><language>eng</language><ispartof>The Journal of the Acoustical Society of America, 2011-10, Vol.130 (4_Supplement), p.2524-2524</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>207,208,315,781,785,27929,27930</link.rule.ids></links><search><creatorcontrib>Zahorian, Stephen</creatorcontrib><creatorcontrib>Wong, Brian</creatorcontrib><title>Spectral amplitude nonlinearities for improved noise robustness of spectral features for use in automatic speech recognition</title><title>The Journal of the Acoustical Society of America</title><description>Auditory models for outer periphery processing include a sigmoid shaped nonlinearity that is even more compressed than standard logarithmic scaling at very low and very high amplitudes. In some studies done at Carnegie Mellon University, it has been shown that this compressive nonlinearity is the most important aspect of the Seneff auditory model in terms of improving accuracy of automatic speech recognition in the presence of noise. However, in this previous work, the nonlinearity was trained for each frequency band of the Mel frequency cepstrum coefficients thus making it impractical to incorporate in automatic speech recognition systems. In the current study, a compressive nonlinearity is parametrically represented and constructed without training, to allow various degrees of steepness and “rounding” of corners for low and high amplitudes. Using this nonlinearity, experimental results for various noise conditions, and with mismatches in noise between training and test data, were obtained for phone recognition using the TIMIT and NTIMIT databases. The implications of the results are that a fixed compressive nonlinearity can be used to improve automatic speech recognition robustness with respect to mismatches between training and test data.</description><issn>0001-4966</issn><issn>1520-8524</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2011</creationdate><recordtype>article</recordtype><recordid>eNqVj81qwzAQhEVpoO7PoW-gaw5OJTtyknNJ6L25C1VZpVtsyexKhUIfPg64D9DTMMw3AyPEs1YrrRv9oldtZ4zabG5EpU2j6q1p1reiUkrper3rujtxz_w1WbNtd5X4fR_BZ3K9dMPYYy4nkDHFHiM4wozAMiSSOIyUvuE0ZcggKX0UzhGYZQqS_yYCuFxorpSJwyhdyWlwGf0VA_8pCXw6x2k6xUexCK5neJr1QSwP--PrW-0pMRMEOxIOjn6sVvZ6z2o732v_w14ArIRYdA</recordid><startdate>20111001</startdate><enddate>20111001</enddate><creator>Zahorian, Stephen</creator><creator>Wong, Brian</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20111001</creationdate><title>Spectral amplitude nonlinearities for improved noise robustness of spectral features for use in automatic speech recognition</title><author>Zahorian, Stephen ; Wong, Brian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-crossref_primary_10_1121_1_36550773</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2011</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zahorian, Stephen</creatorcontrib><creatorcontrib>Wong, Brian</creatorcontrib><collection>CrossRef</collection><jtitle>The Journal of the Acoustical Society of America</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zahorian, Stephen</au><au>Wong, Brian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Spectral amplitude nonlinearities for improved noise robustness of spectral features for use in automatic speech recognition</atitle><jtitle>The Journal of the Acoustical Society of America</jtitle><date>2011-10-01</date><risdate>2011</risdate><volume>130</volume><issue>4_Supplement</issue><spage>2524</spage><epage>2524</epage><pages>2524-2524</pages><issn>0001-4966</issn><eissn>1520-8524</eissn><abstract>Auditory models for outer periphery processing include a sigmoid shaped nonlinearity that is even more compressed than standard logarithmic scaling at very low and very high amplitudes. In some studies done at Carnegie Mellon University, it has been shown that this compressive nonlinearity is the most important aspect of the Seneff auditory model in terms of improving accuracy of automatic speech recognition in the presence of noise. However, in this previous work, the nonlinearity was trained for each frequency band of the Mel frequency cepstrum coefficients thus making it impractical to incorporate in automatic speech recognition systems. In the current study, a compressive nonlinearity is parametrically represented and constructed without training, to allow various degrees of steepness and “rounding” of corners for low and high amplitudes. Using this nonlinearity, experimental results for various noise conditions, and with mismatches in noise between training and test data, were obtained for phone recognition using the TIMIT and NTIMIT databases. The implications of the results are that a fixed compressive nonlinearity can be used to improve automatic speech recognition robustness with respect to mismatches between training and test data.</abstract><doi>10.1121/1.3655077</doi></addata></record>
fulltext	fulltext
identifier	ISSN: 0001-4966
ispartof	The Journal of the Acoustical Society of America, 2011-10, Vol.130 (4_Supplement), p.2524-2524
issn	0001-4966 1520-8524
language	eng
recordid	cdi_crossref_primary_10_1121_1_3655077
source	AIP Journals Complete; Alma/SFX Local Collection; AIP Acoustical Society of America
title	Spectral amplitude nonlinearities for improved noise robustness of spectral features for use in automatic speech recognition
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-13T09%3A34%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Spectral%20amplitude%20nonlinearities%20for%20improved%20noise%20robustness%20of%20spectral%20features%20for%20use%20in%20automatic%20speech%20recognition&rft.jtitle=The%20Journal%20of%20the%20Acoustical%20Society%20of%20America&rft.au=Zahorian,%20Stephen&rft.date=2011-10-01&rft.volume=130&rft.issue=4_Supplement&rft.spage=2524&rft.epage=2524&rft.pages=2524-2524&rft.issn=0001-4966&rft.eissn=1520-8524&rft_id=info:doi/10.1121/1.3655077&rft_dat=%3Ccrossref%3E10_1121_1_3655077%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true