Modulation Spectrum Equalization for Improved Robust Speech Recognition

We propose novel approaches for equalizing the modulation spectrum for robust feature extraction in speech recognition. Common to all approaches in that the temporal trajectories of the feature parameters are first transformed into the magnitude modulation spectrum. In spectral histogram equalizatio...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on audio, speech, and language processing speech, and language processing, 2012-03, Vol.20 (3), p.828-843
Hauptverfasser: SUN, Liang-Che, LEE, Lin-Shan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 843
container_issue 3
container_start_page 828
container_title IEEE transactions on audio, speech, and language processing
container_volume 20
creator SUN, Liang-Che
LEE, Lin-Shan
description We propose novel approaches for equalizing the modulation spectrum for robust feature extraction in speech recognition. Common to all approaches in that the temporal trajectories of the feature parameters are first transformed into the magnitude modulation spectrum. In spectral histogram equalization (SHE) and two-band spectral histogram equalization (2B-SHE), we equalize the histogram of the modulation spectrum for each utterance to a reference histogram obtained from clean training data, or perform the equalization with two sub-bands on the modulation spectrum. In magnitude ratio equalization (MRE), we define the magnitude ratio of lower to higher modulation frequency components for each utterance, and equalize this to a reference value obtained from clean training data. These approaches can be viewed as temporal filters that are adapted to each testing utterance. Experiments performed on the Aurora 2 and 4 corpora for small and large vocabulary tasks indicate that significant performance improvements are achievable for all noise conditions. We also show that additional improvements can be obtained when these approaches are integrated with cepstral mean and variance normalization (CMVN), histogram equalization (HEQ), higher order cepstral moment normalization (HOCMN), or the advanced front-end (AFE). We analyze and discuss the reasons for these improvements from different viewpoints with different sets of data, including adaptive temporal filtering, noise behavior on the modulation spectrum, phoneme types, and modulation spectrum distance measures.
doi_str_mv 10.1109/TASL.2011.2166544
format Article
fullrecord <record><control><sourceid>pascalfrancis_RIE</sourceid><recordid>TN_cdi_pascalfrancis_primary_25549612</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6006516</ieee_id><sourcerecordid>25549612</sourcerecordid><originalsourceid>FETCH-LOGICAL-c361t-bd891fb2bd456e1f9db8af4b053855a2a5c9ebfa89a3e158158223f1a3cf7a7a3</originalsourceid><addsrcrecordid>eNo9kE1Lw0AQhhdRsFZ_gHjJxWPqzn41eyyl1kJFaOs5TDa7Gkm7cTcR9NfbkFIYmGHmeefwEHIPdAJA9dNutl1PGAWYMFBKCnFBRiBllk41E5fnGdQ1uYnxi1LBlYARWb76squxrfwh2TbWtKHbJ4vvDuvqb9g6H5LVvgn-x5bJxhddbHvSms9kY43_OFQ9dkuuHNbR3p36mLw_L3bzl3T9tlzNZ-vUcAVtWpSZBlewohRSWXC6LDJ0oqCSZ1IiQ2m0LRxmGrkFmR2LMe4AuXFTnCIfExj-muBjDNblTaj2GH5zoHlvIu9N5L2J_GTimHkcMg1Gg7ULeDBVPAeZlEIrYEfuYeAqa-35rChVEhT_B7pVaHk</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Modulation Spectrum Equalization for Improved Robust Speech Recognition</title><source>IEEE Electronic Library (IEL)</source><creator>SUN, Liang-Che ; LEE, Lin-Shan</creator><creatorcontrib>SUN, Liang-Che ; LEE, Lin-Shan</creatorcontrib><description>We propose novel approaches for equalizing the modulation spectrum for robust feature extraction in speech recognition. Common to all approaches in that the temporal trajectories of the feature parameters are first transformed into the magnitude modulation spectrum. In spectral histogram equalization (SHE) and two-band spectral histogram equalization (2B-SHE), we equalize the histogram of the modulation spectrum for each utterance to a reference histogram obtained from clean training data, or perform the equalization with two sub-bands on the modulation spectrum. In magnitude ratio equalization (MRE), we define the magnitude ratio of lower to higher modulation frequency components for each utterance, and equalize this to a reference value obtained from clean training data. These approaches can be viewed as temporal filters that are adapted to each testing utterance. Experiments performed on the Aurora 2 and 4 corpora for small and large vocabulary tasks indicate that significant performance improvements are achievable for all noise conditions. We also show that additional improvements can be obtained when these approaches are integrated with cepstral mean and variance normalization (CMVN), histogram equalization (HEQ), higher order cepstral moment normalization (HOCMN), or the advanced front-end (AFE). We analyze and discuss the reasons for these improvements from different viewpoints with different sets of data, including adaptive temporal filtering, noise behavior on the modulation spectrum, phoneme types, and modulation spectrum distance measures.</description><identifier>ISSN: 1558-7916</identifier><identifier>EISSN: 1558-7924</identifier><identifier>DOI: 10.1109/TASL.2011.2166544</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>Piscataway, NJ: IEEE</publisher><subject>Applied sciences ; Band pass filters ; Cepstral analysis ; Detection, estimation, filtering, equalization, prediction ; Exact sciences and technology ; Feature normalization ; Histograms ; Information, signal and communications theory ; Modulation ; modulation spectrum ; Modulation, demodulation ; robust feature extraction ; Signal and communications theory ; Signal processing ; Signal to noise ratio ; Signal, noise ; Speech processing ; Telecommunications and information theory ; temporal filter ; Wiener filter</subject><ispartof>IEEE transactions on audio, speech, and language processing, 2012-03, Vol.20 (3), p.828-843</ispartof><rights>2015 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c361t-bd891fb2bd456e1f9db8af4b053855a2a5c9ebfa89a3e158158223f1a3cf7a7a3</citedby><cites>FETCH-LOGICAL-c361t-bd891fb2bd456e1f9db8af4b053855a2a5c9ebfa89a3e158158223f1a3cf7a7a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6006516$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27923,27924,54757</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6006516$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=25549612$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>SUN, Liang-Che</creatorcontrib><creatorcontrib>LEE, Lin-Shan</creatorcontrib><title>Modulation Spectrum Equalization for Improved Robust Speech Recognition</title><title>IEEE transactions on audio, speech, and language processing</title><addtitle>TASL</addtitle><description>We propose novel approaches for equalizing the modulation spectrum for robust feature extraction in speech recognition. Common to all approaches in that the temporal trajectories of the feature parameters are first transformed into the magnitude modulation spectrum. In spectral histogram equalization (SHE) and two-band spectral histogram equalization (2B-SHE), we equalize the histogram of the modulation spectrum for each utterance to a reference histogram obtained from clean training data, or perform the equalization with two sub-bands on the modulation spectrum. In magnitude ratio equalization (MRE), we define the magnitude ratio of lower to higher modulation frequency components for each utterance, and equalize this to a reference value obtained from clean training data. These approaches can be viewed as temporal filters that are adapted to each testing utterance. Experiments performed on the Aurora 2 and 4 corpora for small and large vocabulary tasks indicate that significant performance improvements are achievable for all noise conditions. We also show that additional improvements can be obtained when these approaches are integrated with cepstral mean and variance normalization (CMVN), histogram equalization (HEQ), higher order cepstral moment normalization (HOCMN), or the advanced front-end (AFE). We analyze and discuss the reasons for these improvements from different viewpoints with different sets of data, including adaptive temporal filtering, noise behavior on the modulation spectrum, phoneme types, and modulation spectrum distance measures.</description><subject>Applied sciences</subject><subject>Band pass filters</subject><subject>Cepstral analysis</subject><subject>Detection, estimation, filtering, equalization, prediction</subject><subject>Exact sciences and technology</subject><subject>Feature normalization</subject><subject>Histograms</subject><subject>Information, signal and communications theory</subject><subject>Modulation</subject><subject>modulation spectrum</subject><subject>Modulation, demodulation</subject><subject>robust feature extraction</subject><subject>Signal and communications theory</subject><subject>Signal processing</subject><subject>Signal to noise ratio</subject><subject>Signal, noise</subject><subject>Speech processing</subject><subject>Telecommunications and information theory</subject><subject>temporal filter</subject><subject>Wiener filter</subject><issn>1558-7916</issn><issn>1558-7924</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1Lw0AQhhdRsFZ_gHjJxWPqzn41eyyl1kJFaOs5TDa7Gkm7cTcR9NfbkFIYmGHmeefwEHIPdAJA9dNutl1PGAWYMFBKCnFBRiBllk41E5fnGdQ1uYnxi1LBlYARWb76squxrfwh2TbWtKHbJ4vvDuvqb9g6H5LVvgn-x5bJxhddbHvSms9kY43_OFQ9dkuuHNbR3p36mLw_L3bzl3T9tlzNZ-vUcAVtWpSZBlewohRSWXC6LDJ0oqCSZ1IiQ2m0LRxmGrkFmR2LMe4AuXFTnCIfExj-muBjDNblTaj2GH5zoHlvIu9N5L2J_GTimHkcMg1Gg7ULeDBVPAeZlEIrYEfuYeAqa-35rChVEhT_B7pVaHk</recordid><startdate>20120301</startdate><enddate>20120301</enddate><creator>SUN, Liang-Che</creator><creator>LEE, Lin-Shan</creator><general>IEEE</general><general>Institute of Electrical and Electronics Engineers</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20120301</creationdate><title>Modulation Spectrum Equalization for Improved Robust Speech Recognition</title><author>SUN, Liang-Che ; LEE, Lin-Shan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c361t-bd891fb2bd456e1f9db8af4b053855a2a5c9ebfa89a3e158158223f1a3cf7a7a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Applied sciences</topic><topic>Band pass filters</topic><topic>Cepstral analysis</topic><topic>Detection, estimation, filtering, equalization, prediction</topic><topic>Exact sciences and technology</topic><topic>Feature normalization</topic><topic>Histograms</topic><topic>Information, signal and communications theory</topic><topic>Modulation</topic><topic>modulation spectrum</topic><topic>Modulation, demodulation</topic><topic>robust feature extraction</topic><topic>Signal and communications theory</topic><topic>Signal processing</topic><topic>Signal to noise ratio</topic><topic>Signal, noise</topic><topic>Speech processing</topic><topic>Telecommunications and information theory</topic><topic>temporal filter</topic><topic>Wiener filter</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>SUN, Liang-Che</creatorcontrib><creatorcontrib>LEE, Lin-Shan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><jtitle>IEEE transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>SUN, Liang-Che</au><au>LEE, Lin-Shan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Modulation Spectrum Equalization for Improved Robust Speech Recognition</atitle><jtitle>IEEE transactions on audio, speech, and language processing</jtitle><stitle>TASL</stitle><date>2012-03-01</date><risdate>2012</risdate><volume>20</volume><issue>3</issue><spage>828</spage><epage>843</epage><pages>828-843</pages><issn>1558-7916</issn><eissn>1558-7924</eissn><coden>ITASD8</coden><abstract>We propose novel approaches for equalizing the modulation spectrum for robust feature extraction in speech recognition. Common to all approaches in that the temporal trajectories of the feature parameters are first transformed into the magnitude modulation spectrum. In spectral histogram equalization (SHE) and two-band spectral histogram equalization (2B-SHE), we equalize the histogram of the modulation spectrum for each utterance to a reference histogram obtained from clean training data, or perform the equalization with two sub-bands on the modulation spectrum. In magnitude ratio equalization (MRE), we define the magnitude ratio of lower to higher modulation frequency components for each utterance, and equalize this to a reference value obtained from clean training data. These approaches can be viewed as temporal filters that are adapted to each testing utterance. Experiments performed on the Aurora 2 and 4 corpora for small and large vocabulary tasks indicate that significant performance improvements are achievable for all noise conditions. We also show that additional improvements can be obtained when these approaches are integrated with cepstral mean and variance normalization (CMVN), histogram equalization (HEQ), higher order cepstral moment normalization (HOCMN), or the advanced front-end (AFE). We analyze and discuss the reasons for these improvements from different viewpoints with different sets of data, including adaptive temporal filtering, noise behavior on the modulation spectrum, phoneme types, and modulation spectrum distance measures.</abstract><cop>Piscataway, NJ</cop><pub>IEEE</pub><doi>10.1109/TASL.2011.2166544</doi><tpages>16</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1558-7916
ispartof IEEE transactions on audio, speech, and language processing, 2012-03, Vol.20 (3), p.828-843
issn 1558-7916
1558-7924
language eng
recordid cdi_pascalfrancis_primary_25549612
source IEEE Electronic Library (IEL)
subjects Applied sciences
Band pass filters
Cepstral analysis
Detection, estimation, filtering, equalization, prediction
Exact sciences and technology
Feature normalization
Histograms
Information, signal and communications theory
Modulation
modulation spectrum
Modulation, demodulation
robust feature extraction
Signal and communications theory
Signal processing
Signal to noise ratio
Signal, noise
Speech processing
Telecommunications and information theory
temporal filter
Wiener filter
title Modulation Spectrum Equalization for Improved Robust Speech Recognition
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T17%3A21%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-pascalfrancis_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Modulation%20Spectrum%20Equalization%20for%20Improved%20Robust%20Speech%20Recognition&rft.jtitle=IEEE%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=SUN,%20Liang-Che&rft.date=2012-03-01&rft.volume=20&rft.issue=3&rft.spage=828&rft.epage=843&rft.pages=828-843&rft.issn=1558-7916&rft.eissn=1558-7924&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASL.2011.2166544&rft_dat=%3Cpascalfrancis_RIE%3E25549612%3C/pascalfrancis_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6006516&rfr_iscdi=true