Modulation Spectrum Equalization for Improved Robust Speech Recognition
We propose novel approaches for equalizing the modulation spectrum for robust feature extraction in speech recognition. Common to all approaches in that the temporal trajectories of the feature parameters are first transformed into the magnitude modulation spectrum. In spectral histogram equalizatio...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on audio, speech, and language processing speech, and language processing, 2012-03, Vol.20 (3), p.828-843 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 843 |
---|---|
container_issue | 3 |
container_start_page | 828 |
container_title | IEEE transactions on audio, speech, and language processing |
container_volume | 20 |
creator | SUN, Liang-Che LEE, Lin-Shan |
description | We propose novel approaches for equalizing the modulation spectrum for robust feature extraction in speech recognition. Common to all approaches in that the temporal trajectories of the feature parameters are first transformed into the magnitude modulation spectrum. In spectral histogram equalization (SHE) and two-band spectral histogram equalization (2B-SHE), we equalize the histogram of the modulation spectrum for each utterance to a reference histogram obtained from clean training data, or perform the equalization with two sub-bands on the modulation spectrum. In magnitude ratio equalization (MRE), we define the magnitude ratio of lower to higher modulation frequency components for each utterance, and equalize this to a reference value obtained from clean training data. These approaches can be viewed as temporal filters that are adapted to each testing utterance. Experiments performed on the Aurora 2 and 4 corpora for small and large vocabulary tasks indicate that significant performance improvements are achievable for all noise conditions. We also show that additional improvements can be obtained when these approaches are integrated with cepstral mean and variance normalization (CMVN), histogram equalization (HEQ), higher order cepstral moment normalization (HOCMN), or the advanced front-end (AFE). We analyze and discuss the reasons for these improvements from different viewpoints with different sets of data, including adaptive temporal filtering, noise behavior on the modulation spectrum, phoneme types, and modulation spectrum distance measures. |
doi_str_mv | 10.1109/TASL.2011.2166544 |
format | Article |
fullrecord | <record><control><sourceid>pascalfrancis_RIE</sourceid><recordid>TN_cdi_pascalfrancis_primary_25549612</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6006516</ieee_id><sourcerecordid>25549612</sourcerecordid><originalsourceid>FETCH-LOGICAL-c361t-bd891fb2bd456e1f9db8af4b053855a2a5c9ebfa89a3e158158223f1a3cf7a7a3</originalsourceid><addsrcrecordid>eNo9kE1Lw0AQhhdRsFZ_gHjJxWPqzn41eyyl1kJFaOs5TDa7Gkm7cTcR9NfbkFIYmGHmeefwEHIPdAJA9dNutl1PGAWYMFBKCnFBRiBllk41E5fnGdQ1uYnxi1LBlYARWb76squxrfwh2TbWtKHbJ4vvDuvqb9g6H5LVvgn-x5bJxhddbHvSms9kY43_OFQ9dkuuHNbR3p36mLw_L3bzl3T9tlzNZ-vUcAVtWpSZBlewohRSWXC6LDJ0oqCSZ1IiQ2m0LRxmGrkFmR2LMe4AuXFTnCIfExj-muBjDNblTaj2GH5zoHlvIu9N5L2J_GTimHkcMg1Gg7ULeDBVPAeZlEIrYEfuYeAqa-35rChVEhT_B7pVaHk</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Modulation Spectrum Equalization for Improved Robust Speech Recognition</title><source>IEEE Electronic Library (IEL)</source><creator>SUN, Liang-Che ; LEE, Lin-Shan</creator><creatorcontrib>SUN, Liang-Che ; LEE, Lin-Shan</creatorcontrib><description>We propose novel approaches for equalizing the modulation spectrum for robust feature extraction in speech recognition. Common to all approaches in that the temporal trajectories of the feature parameters are first transformed into the magnitude modulation spectrum. In spectral histogram equalization (SHE) and two-band spectral histogram equalization (2B-SHE), we equalize the histogram of the modulation spectrum for each utterance to a reference histogram obtained from clean training data, or perform the equalization with two sub-bands on the modulation spectrum. In magnitude ratio equalization (MRE), we define the magnitude ratio of lower to higher modulation frequency components for each utterance, and equalize this to a reference value obtained from clean training data. These approaches can be viewed as temporal filters that are adapted to each testing utterance. Experiments performed on the Aurora 2 and 4 corpora for small and large vocabulary tasks indicate that significant performance improvements are achievable for all noise conditions. We also show that additional improvements can be obtained when these approaches are integrated with cepstral mean and variance normalization (CMVN), histogram equalization (HEQ), higher order cepstral moment normalization (HOCMN), or the advanced front-end (AFE). We analyze and discuss the reasons for these improvements from different viewpoints with different sets of data, including adaptive temporal filtering, noise behavior on the modulation spectrum, phoneme types, and modulation spectrum distance measures.</description><identifier>ISSN: 1558-7916</identifier><identifier>EISSN: 1558-7924</identifier><identifier>DOI: 10.1109/TASL.2011.2166544</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>Piscataway, NJ: IEEE</publisher><subject>Applied sciences ; Band pass filters ; Cepstral analysis ; Detection, estimation, filtering, equalization, prediction ; Exact sciences and technology ; Feature normalization ; Histograms ; Information, signal and communications theory ; Modulation ; modulation spectrum ; Modulation, demodulation ; robust feature extraction ; Signal and communications theory ; Signal processing ; Signal to noise ratio ; Signal, noise ; Speech processing ; Telecommunications and information theory ; temporal filter ; Wiener filter</subject><ispartof>IEEE transactions on audio, speech, and language processing, 2012-03, Vol.20 (3), p.828-843</ispartof><rights>2015 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c361t-bd891fb2bd456e1f9db8af4b053855a2a5c9ebfa89a3e158158223f1a3cf7a7a3</citedby><cites>FETCH-LOGICAL-c361t-bd891fb2bd456e1f9db8af4b053855a2a5c9ebfa89a3e158158223f1a3cf7a7a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6006516$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27923,27924,54757</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6006516$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=25549612$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>SUN, Liang-Che</creatorcontrib><creatorcontrib>LEE, Lin-Shan</creatorcontrib><title>Modulation Spectrum Equalization for Improved Robust Speech Recognition</title><title>IEEE transactions on audio, speech, and language processing</title><addtitle>TASL</addtitle><description>We propose novel approaches for equalizing the modulation spectrum for robust feature extraction in speech recognition. Common to all approaches in that the temporal trajectories of the feature parameters are first transformed into the magnitude modulation spectrum. In spectral histogram equalization (SHE) and two-band spectral histogram equalization (2B-SHE), we equalize the histogram of the modulation spectrum for each utterance to a reference histogram obtained from clean training data, or perform the equalization with two sub-bands on the modulation spectrum. In magnitude ratio equalization (MRE), we define the magnitude ratio of lower to higher modulation frequency components for each utterance, and equalize this to a reference value obtained from clean training data. These approaches can be viewed as temporal filters that are adapted to each testing utterance. Experiments performed on the Aurora 2 and 4 corpora for small and large vocabulary tasks indicate that significant performance improvements are achievable for all noise conditions. We also show that additional improvements can be obtained when these approaches are integrated with cepstral mean and variance normalization (CMVN), histogram equalization (HEQ), higher order cepstral moment normalization (HOCMN), or the advanced front-end (AFE). We analyze and discuss the reasons for these improvements from different viewpoints with different sets of data, including adaptive temporal filtering, noise behavior on the modulation spectrum, phoneme types, and modulation spectrum distance measures.</description><subject>Applied sciences</subject><subject>Band pass filters</subject><subject>Cepstral analysis</subject><subject>Detection, estimation, filtering, equalization, prediction</subject><subject>Exact sciences and technology</subject><subject>Feature normalization</subject><subject>Histograms</subject><subject>Information, signal and communications theory</subject><subject>Modulation</subject><subject>modulation spectrum</subject><subject>Modulation, demodulation</subject><subject>robust feature extraction</subject><subject>Signal and communications theory</subject><subject>Signal processing</subject><subject>Signal to noise ratio</subject><subject>Signal, noise</subject><subject>Speech processing</subject><subject>Telecommunications and information theory</subject><subject>temporal filter</subject><subject>Wiener filter</subject><issn>1558-7916</issn><issn>1558-7924</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1Lw0AQhhdRsFZ_gHjJxWPqzn41eyyl1kJFaOs5TDa7Gkm7cTcR9NfbkFIYmGHmeefwEHIPdAJA9dNutl1PGAWYMFBKCnFBRiBllk41E5fnGdQ1uYnxi1LBlYARWb76squxrfwh2TbWtKHbJ4vvDuvqb9g6H5LVvgn-x5bJxhddbHvSms9kY43_OFQ9dkuuHNbR3p36mLw_L3bzl3T9tlzNZ-vUcAVtWpSZBlewohRSWXC6LDJ0oqCSZ1IiQ2m0LRxmGrkFmR2LMe4AuXFTnCIfExj-muBjDNblTaj2GH5zoHlvIu9N5L2J_GTimHkcMg1Gg7ULeDBVPAeZlEIrYEfuYeAqa-35rChVEhT_B7pVaHk</recordid><startdate>20120301</startdate><enddate>20120301</enddate><creator>SUN, Liang-Che</creator><creator>LEE, Lin-Shan</creator><general>IEEE</general><general>Institute of Electrical and Electronics Engineers</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20120301</creationdate><title>Modulation Spectrum Equalization for Improved Robust Speech Recognition</title><author>SUN, Liang-Che ; LEE, Lin-Shan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c361t-bd891fb2bd456e1f9db8af4b053855a2a5c9ebfa89a3e158158223f1a3cf7a7a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Applied sciences</topic><topic>Band pass filters</topic><topic>Cepstral analysis</topic><topic>Detection, estimation, filtering, equalization, prediction</topic><topic>Exact sciences and technology</topic><topic>Feature normalization</topic><topic>Histograms</topic><topic>Information, signal and communications theory</topic><topic>Modulation</topic><topic>modulation spectrum</topic><topic>Modulation, demodulation</topic><topic>robust feature extraction</topic><topic>Signal and communications theory</topic><topic>Signal processing</topic><topic>Signal to noise ratio</topic><topic>Signal, noise</topic><topic>Speech processing</topic><topic>Telecommunications and information theory</topic><topic>temporal filter</topic><topic>Wiener filter</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>SUN, Liang-Che</creatorcontrib><creatorcontrib>LEE, Lin-Shan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><jtitle>IEEE transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>SUN, Liang-Che</au><au>LEE, Lin-Shan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Modulation Spectrum Equalization for Improved Robust Speech Recognition</atitle><jtitle>IEEE transactions on audio, speech, and language processing</jtitle><stitle>TASL</stitle><date>2012-03-01</date><risdate>2012</risdate><volume>20</volume><issue>3</issue><spage>828</spage><epage>843</epage><pages>828-843</pages><issn>1558-7916</issn><eissn>1558-7924</eissn><coden>ITASD8</coden><abstract>We propose novel approaches for equalizing the modulation spectrum for robust feature extraction in speech recognition. Common to all approaches in that the temporal trajectories of the feature parameters are first transformed into the magnitude modulation spectrum. In spectral histogram equalization (SHE) and two-band spectral histogram equalization (2B-SHE), we equalize the histogram of the modulation spectrum for each utterance to a reference histogram obtained from clean training data, or perform the equalization with two sub-bands on the modulation spectrum. In magnitude ratio equalization (MRE), we define the magnitude ratio of lower to higher modulation frequency components for each utterance, and equalize this to a reference value obtained from clean training data. These approaches can be viewed as temporal filters that are adapted to each testing utterance. Experiments performed on the Aurora 2 and 4 corpora for small and large vocabulary tasks indicate that significant performance improvements are achievable for all noise conditions. We also show that additional improvements can be obtained when these approaches are integrated with cepstral mean and variance normalization (CMVN), histogram equalization (HEQ), higher order cepstral moment normalization (HOCMN), or the advanced front-end (AFE). We analyze and discuss the reasons for these improvements from different viewpoints with different sets of data, including adaptive temporal filtering, noise behavior on the modulation spectrum, phoneme types, and modulation spectrum distance measures.</abstract><cop>Piscataway, NJ</cop><pub>IEEE</pub><doi>10.1109/TASL.2011.2166544</doi><tpages>16</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1558-7916 |
ispartof | IEEE transactions on audio, speech, and language processing, 2012-03, Vol.20 (3), p.828-843 |
issn | 1558-7916 1558-7924 |
language | eng |
recordid | cdi_pascalfrancis_primary_25549612 |
source | IEEE Electronic Library (IEL) |
subjects | Applied sciences Band pass filters Cepstral analysis Detection, estimation, filtering, equalization, prediction Exact sciences and technology Feature normalization Histograms Information, signal and communications theory Modulation modulation spectrum Modulation, demodulation robust feature extraction Signal and communications theory Signal processing Signal to noise ratio Signal, noise Speech processing Telecommunications and information theory temporal filter Wiener filter |
title | Modulation Spectrum Equalization for Improved Robust Speech Recognition |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T17%3A21%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-pascalfrancis_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Modulation%20Spectrum%20Equalization%20for%20Improved%20Robust%20Speech%20Recognition&rft.jtitle=IEEE%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=SUN,%20Liang-Che&rft.date=2012-03-01&rft.volume=20&rft.issue=3&rft.spage=828&rft.epage=843&rft.pages=828-843&rft.issn=1558-7916&rft.eissn=1558-7924&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASL.2011.2166544&rft_dat=%3Cpascalfrancis_RIE%3E25549612%3C/pascalfrancis_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6006516&rfr_iscdi=true |