Feature normalization based on non-extensive statistics for speech recognition

► We propose a feature normalization method for robust speech recognition. ► It operates in a spectral domain intermediate between log and linear. ► We name our method q-logarithmic Spectral Mean Normalization (q-LSMN). ► It is based on non-extensive statistics in which additivity does not hold. ► I...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Speech communication 2013-06, Vol.55 (5), p.587-599
Hauptverfasser:	Pardede, Hilman F., Iwano, Koji, Shinoda, Koichi
Format:	Artikel
Sprache:	eng
Schlagworte:	Deterioration Non-extensive statistics Normalization q-Logarithm Recognition Robust speech recognition Robustness Spectra Speech Speech recognition Statistics
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	599
container_issue	5
container_start_page	587
container_title	Speech communication
container_volume	55
creator	Pardede, Hilman F. Iwano, Koji Shinoda, Koichi
description	► We propose a feature normalization method for robust speech recognition. ► It operates in a spectral domain intermediate between log and linear. ► We name our method q-logarithmic Spectral Mean Normalization (q-LSMN). ► It is based on non-extensive statistics in which additivity does not hold. ► It was better than CMN, MVN, and ETSI AFE in our experiments. Most compensation methods to improve the robustness of speech recognition systems in noisy environments such as spectral subtraction, CMN, and MVN, rely on the fact that noise and speech spectra are independent. However, the use of limited window in signal processing may introduce a cross-term between them, which deteriorates the speech recognition accuracy. To tackle this problem, we introduce the q-logarithmic (q-log) spectral domain of non-extensive statistics and propose q-log spectral mean normalization (q-LSMN) which is an extension of log spectral mean normalization (LSMN) to this domain. The recognition experiments on a synthesized noisy speech database, the Aurora-2 database, showed that q-LSMN was consistently better than the conventional normalization methods, CMN, LSMN, and MVN. Furthermore, q-LSMN was even more effective when applied to a real noisy environment in the CENSREC-2 database. It significantly outperformed ETSI AFE front-end.
doi_str_mv	10.1016/j.specom.2013.02.004
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1671363098</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167639313000289</els_id><sourcerecordid>1671363098</sourcerecordid><originalsourceid>FETCH-LOGICAL-c495t-b44073b41a8439d9d8f8cb604d79c5eea68961bd78d5bca9c279739a7fe70d7f3</originalsourceid><addsrcrecordid>eNp9kE9PAyEQxYnRxFr9Bh726GVXWOgCFxPT-C9p9KJnwsKs0myhAm3UTy_NevY0k8zvvcl7CF0S3BBMuut1k7ZgwqZpMaENbhuM2RGaEcHbmhPRHqNZwXjdUUlP0VlKa1wIIdoZer4HnXcRKh_iRo_uR2cXfNXrBLYqiw--hq8MPrk9VCmXc8rOpGoIsSpfwXxUsfx-9-4gPEcngx4TXPzNOXq7v3tdPtarl4en5e2qNkwuct0zhjntGdGCUWmlFYMwfYeZ5dIsAHQnZEd6y4Vd9EZL03LJqdR8AI4tH-gcXU2-2xg-d5Cy2rhkYBy1h7BLqqQltKNYioKyCTUxpBRhUNvoNjp-K4LVoT61VlN96lCfwq0q5RTZzSSDEmPvIKpkHHgD1pW8Wdng_jf4BQdufE0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1671363098</pqid></control><display><type>article</type><title>Feature normalization based on non-extensive statistics for speech recognition</title><source>Elsevier ScienceDirect Journals</source><creator>Pardede, Hilman F. ; Iwano, Koji ; Shinoda, Koichi</creator><creatorcontrib>Pardede, Hilman F. ; Iwano, Koji ; Shinoda, Koichi</creatorcontrib><description>► We propose a feature normalization method for robust speech recognition. ► It operates in a spectral domain intermediate between log and linear. ► We name our method q-logarithmic Spectral Mean Normalization (q-LSMN). ► It is based on non-extensive statistics in which additivity does not hold. ► It was better than CMN, MVN, and ETSI AFE in our experiments. Most compensation methods to improve the robustness of speech recognition systems in noisy environments such as spectral subtraction, CMN, and MVN, rely on the fact that noise and speech spectra are independent. However, the use of limited window in signal processing may introduce a cross-term between them, which deteriorates the speech recognition accuracy. To tackle this problem, we introduce the q-logarithmic (q-log) spectral domain of non-extensive statistics and propose q-log spectral mean normalization (q-LSMN) which is an extension of log spectral mean normalization (LSMN) to this domain. The recognition experiments on a synthesized noisy speech database, the Aurora-2 database, showed that q-LSMN was consistently better than the conventional normalization methods, CMN, LSMN, and MVN. Furthermore, q-LSMN was even more effective when applied to a real noisy environment in the CENSREC-2 database. It significantly outperformed ETSI AFE front-end.</description><identifier>ISSN: 0167-6393</identifier><identifier>EISSN: 1872-7182</identifier><identifier>DOI: 10.1016/j.specom.2013.02.004</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>Deterioration ; Non-extensive statistics ; Normalization ; q-Logarithm ; Recognition ; Robust speech recognition ; Robustness ; Spectra ; Speech ; Speech recognition ; Statistics</subject><ispartof>Speech communication, 2013-06, Vol.55 (5), p.587-599</ispartof><rights>2013 Elsevier B.V.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c495t-b44073b41a8439d9d8f8cb604d79c5eea68961bd78d5bca9c279739a7fe70d7f3</citedby><cites>FETCH-LOGICAL-c495t-b44073b41a8439d9d8f8cb604d79c5eea68961bd78d5bca9c279739a7fe70d7f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0167639313000289$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Pardede, Hilman F.</creatorcontrib><creatorcontrib>Iwano, Koji</creatorcontrib><creatorcontrib>Shinoda, Koichi</creatorcontrib><title>Feature normalization based on non-extensive statistics for speech recognition</title><title>Speech communication</title><description>► We propose a feature normalization method for robust speech recognition. ► It operates in a spectral domain intermediate between log and linear. ► We name our method q-logarithmic Spectral Mean Normalization (q-LSMN). ► It is based on non-extensive statistics in which additivity does not hold. ► It was better than CMN, MVN, and ETSI AFE in our experiments. Most compensation methods to improve the robustness of speech recognition systems in noisy environments such as spectral subtraction, CMN, and MVN, rely on the fact that noise and speech spectra are independent. However, the use of limited window in signal processing may introduce a cross-term between them, which deteriorates the speech recognition accuracy. To tackle this problem, we introduce the q-logarithmic (q-log) spectral domain of non-extensive statistics and propose q-log spectral mean normalization (q-LSMN) which is an extension of log spectral mean normalization (LSMN) to this domain. The recognition experiments on a synthesized noisy speech database, the Aurora-2 database, showed that q-LSMN was consistently better than the conventional normalization methods, CMN, LSMN, and MVN. Furthermore, q-LSMN was even more effective when applied to a real noisy environment in the CENSREC-2 database. It significantly outperformed ETSI AFE front-end.</description><subject>Deterioration</subject><subject>Non-extensive statistics</subject><subject>Normalization</subject><subject>q-Logarithm</subject><subject>Recognition</subject><subject>Robust speech recognition</subject><subject>Robustness</subject><subject>Spectra</subject><subject>Speech</subject><subject>Speech recognition</subject><subject>Statistics</subject><issn>0167-6393</issn><issn>1872-7182</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><recordid>eNp9kE9PAyEQxYnRxFr9Bh726GVXWOgCFxPT-C9p9KJnwsKs0myhAm3UTy_NevY0k8zvvcl7CF0S3BBMuut1k7ZgwqZpMaENbhuM2RGaEcHbmhPRHqNZwXjdUUlP0VlKa1wIIdoZer4HnXcRKh_iRo_uR2cXfNXrBLYqiw--hq8MPrk9VCmXc8rOpGoIsSpfwXxUsfx-9-4gPEcngx4TXPzNOXq7v3tdPtarl4en5e2qNkwuct0zhjntGdGCUWmlFYMwfYeZ5dIsAHQnZEd6y4Vd9EZL03LJqdR8AI4tH-gcXU2-2xg-d5Cy2rhkYBy1h7BLqqQltKNYioKyCTUxpBRhUNvoNjp-K4LVoT61VlN96lCfwq0q5RTZzSSDEmPvIKpkHHgD1pW8Wdng_jf4BQdufE0</recordid><startdate>20130601</startdate><enddate>20130601</enddate><creator>Pardede, Hilman F.</creator><creator>Iwano, Koji</creator><creator>Shinoda, Koichi</creator><general>Elsevier B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20130601</creationdate><title>Feature normalization based on non-extensive statistics for speech recognition</title><author>Pardede, Hilman F. ; Iwano, Koji ; Shinoda, Koichi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c495t-b44073b41a8439d9d8f8cb604d79c5eea68961bd78d5bca9c279739a7fe70d7f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Deterioration</topic><topic>Non-extensive statistics</topic><topic>Normalization</topic><topic>q-Logarithm</topic><topic>Recognition</topic><topic>Robust speech recognition</topic><topic>Robustness</topic><topic>Spectra</topic><topic>Speech</topic><topic>Speech recognition</topic><topic>Statistics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Pardede, Hilman F.</creatorcontrib><creatorcontrib>Iwano, Koji</creatorcontrib><creatorcontrib>Shinoda, Koichi</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Speech communication</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pardede, Hilman F.</au><au>Iwano, Koji</au><au>Shinoda, Koichi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Feature normalization based on non-extensive statistics for speech recognition</atitle><jtitle>Speech communication</jtitle><date>2013-06-01</date><risdate>2013</risdate><volume>55</volume><issue>5</issue><spage>587</spage><epage>599</epage><pages>587-599</pages><issn>0167-6393</issn><eissn>1872-7182</eissn><abstract>► We propose a feature normalization method for robust speech recognition. ► It operates in a spectral domain intermediate between log and linear. ► We name our method q-logarithmic Spectral Mean Normalization (q-LSMN). ► It is based on non-extensive statistics in which additivity does not hold. ► It was better than CMN, MVN, and ETSI AFE in our experiments. Most compensation methods to improve the robustness of speech recognition systems in noisy environments such as spectral subtraction, CMN, and MVN, rely on the fact that noise and speech spectra are independent. However, the use of limited window in signal processing may introduce a cross-term between them, which deteriorates the speech recognition accuracy. To tackle this problem, we introduce the q-logarithmic (q-log) spectral domain of non-extensive statistics and propose q-log spectral mean normalization (q-LSMN) which is an extension of log spectral mean normalization (LSMN) to this domain. The recognition experiments on a synthesized noisy speech database, the Aurora-2 database, showed that q-LSMN was consistently better than the conventional normalization methods, CMN, LSMN, and MVN. Furthermore, q-LSMN was even more effective when applied to a real noisy environment in the CENSREC-2 database. It significantly outperformed ETSI AFE front-end.</abstract><pub>Elsevier B.V</pub><doi>10.1016/j.specom.2013.02.004</doi><tpages>13</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0167-6393
ispartof	Speech communication, 2013-06, Vol.55 (5), p.587-599
issn	0167-6393 1872-7182
language	eng
recordid	cdi_proquest_miscellaneous_1671363098
source	Elsevier ScienceDirect Journals
subjects	Deterioration Non-extensive statistics Normalization q-Logarithm Recognition Robust speech recognition Robustness Spectra Speech Speech recognition Statistics
title	Feature normalization based on non-extensive statistics for speech recognition
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T06%3A23%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Feature%20normalization%20based%20on%20non-extensive%20statistics%20for%20speech%20recognition&rft.jtitle=Speech%20communication&rft.au=Pardede,%20Hilman%20F.&rft.date=2013-06-01&rft.volume=55&rft.issue=5&rft.spage=587&rft.epage=599&rft.pages=587-599&rft.issn=0167-6393&rft.eissn=1872-7182&rft_id=info:doi/10.1016/j.specom.2013.02.004&rft_dat=%3Cproquest_cross%3E1671363098%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1671363098&rft_id=info:pmid/&rft_els_id=S0167639313000289&rfr_iscdi=true