Modulation Spectrum Equalization for Improved Robust Speech Recognition

We propose novel approaches for equalizing the modulation spectrum for robust feature extraction in speech recognition. Common to all approaches in that the temporal trajectories of the feature parameters are first transformed into the magnitude modulation spectrum. In spectral histogram equalizatio...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on audio, speech, and language processing speech, and language processing, 2012-03, Vol.20 (3), p.828-843
Hauptverfasser:	SUN, Liang-Che, LEE, Lin-Shan
Format:	Artikel
Sprache:	eng
Schlagworte:	Applied sciences Band pass filters Cepstral analysis Detection, estimation, filtering, equalization, prediction Exact sciences and technology Feature normalization Histograms Information, signal and communications theory Modulation modulation spectrum Modulation, demodulation robust feature extraction Signal and communications theory Signal processing Signal to noise ratio Signal, noise Speech processing Telecommunications and information theory temporal filter Wiener filter
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	843
container_issue	3
container_start_page	828
container_title	IEEE transactions on audio, speech, and language processing
container_volume	20
creator	SUN, Liang-Che LEE, Lin-Shan
description	We propose novel approaches for equalizing the modulation spectrum for robust feature extraction in speech recognition. Common to all approaches in that the temporal trajectories of the feature parameters are first transformed into the magnitude modulation spectrum. In spectral histogram equalization (SHE) and two-band spectral histogram equalization (2B-SHE), we equalize the histogram of the modulation spectrum for each utterance to a reference histogram obtained from clean training data, or perform the equalization with two sub-bands on the modulation spectrum. In magnitude ratio equalization (MRE), we define the magnitude ratio of lower to higher modulation frequency components for each utterance, and equalize this to a reference value obtained from clean training data. These approaches can be viewed as temporal filters that are adapted to each testing utterance. Experiments performed on the Aurora 2 and 4 corpora for small and large vocabulary tasks indicate that significant performance improvements are achievable for all noise conditions. We also show that additional improvements can be obtained when these approaches are integrated with cepstral mean and variance normalization (CMVN), histogram equalization (HEQ), higher order cepstral moment normalization (HOCMN), or the advanced front-end (AFE). We analyze and discuss the reasons for these improvements from different viewpoints with different sets of data, including adaptive temporal filtering, noise behavior on the modulation spectrum, phoneme types, and modulation spectrum distance measures.
doi_str_mv	10.1109/TASL.2011.2166544
format	Article
fullrecord	<record><control><sourceid>pascalfrancis_RIE</sourceid><recordid>TN_cdi_pascalfrancis_primary_25549612</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6006516</ieee_id><sourcerecordid>25549612</sourcerecordid><originalsourceid>FETCH-LOGICAL-c361t-bd891fb2bd456e1f9db8af4b053855a2a5c9ebfa89a3e158158223f1a3cf7a7a3</originalsourceid><addsrcrecordid>eNo9kE1Lw0AQhhdRsFZ_gHjJxWPqzn41eyyl1kJFaOs5TDa7Gkm7cTcR9NfbkFIYmGHmeefwEHIPdAJA9dNutl1PGAWYMFBKCnFBRiBllk41E5fnGdQ1uYnxi1LBlYARWb76squxrfwh2TbWtKHbJ4vvDuvqb9g6H5LVvgn-x5bJxhddbHvSms9kY43_OFQ9dkuuHNbR3p36mLw_L3bzl3T9tlzNZ-vUcAVtWpSZBlewohRSWXC6LDJ0oqCSZ1IiQ2m0LRxmGrkFmR2LMe4AuXFTnCIfExj-muBjDNblTaj2GH5zoHlvIu9N5L2J_GTimHkcMg1Gg7ULeDBVPAeZlEIrYEfuYeAqa-35rChVEhT_B7pVaHk</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Modulation Spectrum Equalization for Improved Robust Speech Recognition</title><source>IEEE Electronic Library (IEL)</source><creator>SUN, Liang-Che ; LEE, Lin-Shan</creator><creatorcontrib>SUN, Liang-Che ; LEE, Lin-Shan</creatorcontrib><description>We propose novel approaches for equalizing the modulation spectrum for robust feature extraction in speech recognition. Common to all approaches in that the temporal trajectories of the feature parameters are first transformed into the magnitude modulation spectrum. In spectral histogram equalization (SHE) and two-band spectral histogram equalization (2B-SHE), we equalize the histogram of the modulation spectrum for each utterance to a reference histogram obtained from clean training data, or perform the equalization with two sub-bands on the modulation spectrum. In magnitude ratio equalization (MRE), we define the magnitude ratio of lower to higher modulation frequency components for each utterance, and equalize this to a reference value obtained from clean training data. These approaches can be viewed as temporal filters that are adapted to each testing utterance. Experiments performed on the Aurora 2 and 4 corpora for small and large vocabulary tasks indicate that significant performance improvements are achievable for all noise conditions. We also show that additional improvements can be obtained when these approaches are integrated with cepstral mean and variance normalization (CMVN), histogram equalization (HEQ), higher order cepstral moment normalization (HOCMN), or the advanced front-end (AFE). We analyze and discuss the reasons for these improvements from different viewpoints with different sets of data, including adaptive temporal filtering, noise behavior on the modulation spectrum, phoneme types, and modulation spectrum distance measures.</description><identifier>ISSN: 1558-7916</identifier><identifier>EISSN: 1558-7924</identifier><identifier>DOI: 10.1109/TASL.2011.2166544</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>Piscataway, NJ: IEEE</publisher><subject>Applied sciences ; Band pass filters ; Cepstral analysis ; Detection, estimation, filtering, equalization, prediction ; Exact sciences and technology ; Feature normalization ; Histograms ; Information, signal and communications theory ; Modulation ; modulation spectrum ; Modulation, demodulation ; robust feature extraction ; Signal and communications theory ; Signal processing ; Signal to noise ratio ; Signal, noise ; Speech processing ; Telecommunications and information theory ; temporal filter ; Wiener filter</subject><ispartof>IEEE transactions on audio, speech, and language processing, 2012-03, Vol.20 (3), p.828-843</ispartof><rights>2015 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c361t-bd891fb2bd456e1f9db8af4b053855a2a5c9ebfa89a3e158158223f1a3cf7a7a3</citedby><cites>FETCH-LOGICAL-c361t-bd891fb2bd456e1f9db8af4b053855a2a5c9ebfa89a3e158158223f1a3cf7a7a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6006516$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27923,27924,54757</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6006516$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=25549612$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>SUN, Liang-Che</creatorcontrib><creatorcontrib>LEE, Lin-Shan</creatorcontrib><title>Modulation Spectrum Equalization for Improved Robust Speech Recognition</title><title>IEEE transactions on audio, speech, and language processing</title><addtitle>TASL</addtitle><description>We propose novel approaches for equalizing the modulation spectrum for robust feature extraction in speech recognition. Common to all approaches in that the temporal trajectories of the feature parameters are first transformed into the magnitude modulation spectrum. In spectral histogram equalization (SHE) and two-band spectral histogram equalization (2B-SHE), we equalize the histogram of the modulation spectrum for each utterance to a reference histogram obtained from clean training data, or perform the equalization with two sub-bands on the modulation spectrum. In magnitude ratio equalization (MRE), we define the magnitude ratio of lower to higher modulation frequency components for each utterance, and equalize this to a reference value obtained from clean training data. These approaches can be viewed as temporal filters that are adapted to each testing utterance. Experiments performed on the Aurora 2 and 4 corpora for small and large vocabulary tasks indicate that significant performance improvements are achievable for all noise conditions. We also show that additional improvements can be obtained when these approaches are integrated with cepstral mean and variance normalization (CMVN), histogram equalization (HEQ), higher order cepstral moment normalization (HOCMN), or the advanced front-end (AFE). We analyze and discuss the reasons for these improvements from different viewpoints with different sets of data, including adaptive temporal filtering, noise behavior on the modulation spectrum, phoneme types, and modulation spectrum distance measures.</description><subject>Applied sciences</subject><subject>Band pass filters</subject><subject>Cepstral analysis</subject><subject>Detection, estimation, filtering, equalization, prediction</subject><subject>Exact sciences and technology</subject><subject>Feature normalization</subject><subject>Histograms</subject><subject>Information, signal and communications theory</subject><subject>Modulation</subject><subject>modulation spectrum</subject><subject>Modulation, demodulation</subject><subject>robust feature extraction</subject><subject>Signal and communications theory</subject><subject>Signal processing</subject><subject>Signal to noise ratio</subject><subject>Signal, noise</subject><subject>Speech processing</subject><subject>Telecommunications and information theory</subject><subject>temporal filter</subject><subject>Wiener filter</subject><issn>1558-7916</issn><issn>1558-7924</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1Lw0AQhhdRsFZ_gHjJxWPqzn41eyyl1kJFaOs5TDa7Gkm7cTcR9NfbkFIYmGHmeefwEHIPdAJA9dNutl1PGAWYMFBKCnFBRiBllk41E5fnGdQ1uYnxi1LBlYARWb76squxrfwh2TbWtKHbJ4vvDuvqb9g6H5LVvgn-x5bJxhddbHvSms9kY43_OFQ9dkuuHNbR3p36mLw_L3bzl3T9tlzNZ-vUcAVtWpSZBlewohRSWXC6LDJ0oqCSZ1IiQ2m0LRxmGrkFmR2LMe4AuXFTnCIfExj-muBjDNblTaj2GH5zoHlvIu9N5L2J_GTimHkcMg1Gg7ULeDBVPAeZlEIrYEfuYeAqa-35rChVEhT_B7pVaHk</recordid><startdate>20120301</startdate><enddate>20120301</enddate><creator>SUN, Liang-Che</creator><creator>LEE, Lin-Shan</creator><general>IEEE</general><general>Institute of Electrical and Electronics Engineers</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20120301</creationdate><title>Modulation Spectrum Equalization for Improved Robust Speech Recognition</title><author>SUN, Liang-Che ; LEE, Lin-Shan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c361t-bd891fb2bd456e1f9db8af4b053855a2a5c9ebfa89a3e158158223f1a3cf7a7a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Applied sciences</topic><topic>Band pass filters</topic><topic>Cepstral analysis</topic><topic>Detection, estimation, filtering, equalization, prediction</topic><topic>Exact sciences and technology</topic><topic>Feature normalization</topic><topic>Histograms</topic><topic>Information, signal and communications theory</topic><topic>Modulation</topic><topic>modulation spectrum</topic><topic>Modulation, demodulation</topic><topic>robust feature extraction</topic><topic>Signal and communications theory</topic><topic>Signal processing</topic><topic>Signal to noise ratio</topic><topic>Signal, noise</topic><topic>Speech processing</topic><topic>Telecommunications and information theory</topic><topic>temporal filter</topic><topic>Wiener filter</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>SUN, Liang-Che</creatorcontrib><creatorcontrib>LEE, Lin-Shan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><jtitle>IEEE transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>SUN, Liang-Che</au><au>LEE, Lin-Shan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Modulation Spectrum Equalization for Improved Robust Speech Recognition</atitle><jtitle>IEEE transactions on audio, speech, and language processing</jtitle><stitle>TASL</stitle><date>2012-03-01</date><risdate>2012</risdate><volume>20</volume><issue>3</issue><spage>828</spage><epage>843</epage><pages>828-843</pages><issn>1558-7916</issn><eissn>1558-7924</eissn><coden>ITASD8</coden><abstract>We propose novel approaches for equalizing the modulation spectrum for robust feature extraction in speech recognition. Common to all approaches in that the temporal trajectories of the feature parameters are first transformed into the magnitude modulation spectrum. In spectral histogram equalization (SHE) and two-band spectral histogram equalization (2B-SHE), we equalize the histogram of the modulation spectrum for each utterance to a reference histogram obtained from clean training data, or perform the equalization with two sub-bands on the modulation spectrum. In magnitude ratio equalization (MRE), we define the magnitude ratio of lower to higher modulation frequency components for each utterance, and equalize this to a reference value obtained from clean training data. These approaches can be viewed as temporal filters that are adapted to each testing utterance. Experiments performed on the Aurora 2 and 4 corpora for small and large vocabulary tasks indicate that significant performance improvements are achievable for all noise conditions. We also show that additional improvements can be obtained when these approaches are integrated with cepstral mean and variance normalization (CMVN), histogram equalization (HEQ), higher order cepstral moment normalization (HOCMN), or the advanced front-end (AFE). We analyze and discuss the reasons for these improvements from different viewpoints with different sets of data, including adaptive temporal filtering, noise behavior on the modulation spectrum, phoneme types, and modulation spectrum distance measures.</abstract><cop>Piscataway, NJ</cop><pub>IEEE</pub><doi>10.1109/TASL.2011.2166544</doi><tpages>16</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1558-7916
ispartof	IEEE transactions on audio, speech, and language processing, 2012-03, Vol.20 (3), p.828-843
issn	1558-7916 1558-7924
language	eng
recordid	cdi_pascalfrancis_primary_25549612
source	IEEE Electronic Library (IEL)
subjects	Applied sciences Band pass filters Cepstral analysis Detection, estimation, filtering, equalization, prediction Exact sciences and technology Feature normalization Histograms Information, signal and communications theory Modulation modulation spectrum Modulation, demodulation robust feature extraction Signal and communications theory Signal processing Signal to noise ratio Signal, noise Speech processing Telecommunications and information theory temporal filter Wiener filter
title	Modulation Spectrum Equalization for Improved Robust Speech Recognition
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T17%3A21%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-pascalfrancis_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Modulation%20Spectrum%20Equalization%20for%20Improved%20Robust%20Speech%20Recognition&rft.jtitle=IEEE%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=SUN,%20Liang-Che&rft.date=2012-03-01&rft.volume=20&rft.issue=3&rft.spage=828&rft.epage=843&rft.pages=828-843&rft.issn=1558-7916&rft.eissn=1558-7924&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASL.2011.2166544&rft_dat=%3Cpascalfrancis_RIE%3E25549612%3C/pascalfrancis_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6006516&rfr_iscdi=true