Spectral Magnitude Minimum Mean-Square Error Estimation Using Binary and Continuous Gain Functions

Recently, binary mask techniques have been proposed as a tool for retrieving a target speech signal from a noisy observation. A binary gain function is applied to time-frequency tiles of the noisy observation in order to suppress noise dominated and retain target dominated time-frequency regions. Wh...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on audio, speech, and language processing speech, and language processing, 2012-01, Vol.20 (1), p.92-102
Hauptverfasser: Jensen, J., Hendriks, R. C.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 102
container_issue 1
container_start_page 92
container_title IEEE transactions on audio, speech, and language processing
container_volume 20
creator Jensen, J.
Hendriks, R. C.
description Recently, binary mask techniques have been proposed as a tool for retrieving a target speech signal from a noisy observation. A binary gain function is applied to time-frequency tiles of the noisy observation in order to suppress noise dominated and retain target dominated time-frequency regions. When implemented using discrete Fourier transform (DFT) techniques, the binary mask techniques can be seen as a special case of the broader class of DFT-based speech enhancement algorithms, for which the applied gain function is not constrained to be binary. In this context, we develop and compare binary mask techniques to state-of-the-art continuous gain techniques. We derive spectral magnitude minimum mean-square error binary gain estimators; the binary gain estimators turn out to be simple functions of the continuous gain estimators. We show that the optimal binary estimators are closely related to a range of existing, heuristically developed, binary gain estimators. The derived binary gain estimators perform better than existing binary gain estimators in simulation experiments with speech signals contaminated by several different noise sources as measured by speech quality and intelligibility measures. However, even the best binary mask method is significantly outperformed by state-of-the-art continuous gain estimators. The instrumental intelligibility results are confirmed in an intelligibility listening test.
doi_str_mv 10.1109/TASL.2011.2157685
format Article
fullrecord <record><control><sourceid>pascalfrancis_RIE</sourceid><recordid>TN_cdi_pascalfrancis_primary_25473436</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5773481</ieee_id><sourcerecordid>25473436</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-85b559c15f3af6154390da551af753079ffe20c7e0ad574e30201ac367203a7e3</originalsourceid><addsrcrecordid>eNo9kDFvwjAQha2qlUppf0DVxUvHUF-ci8lIEdBKoA7AHB2OjVyBQ-1k4N83EYjpTnrvne59jL2CGAGI4mMzWS9HqQAYpYAqH-MdGwDiOFFFmt3fdsgf2VOMv0JkMs9gwHbrk9FNoANf0d67pq0MXznvju2Rrwz5ZP3XUjB8FkId-Cw27kiNqz3fRuf3_NN5CmdOvuLT2jfOt3Ub-YKc5_PW694Zn9mDpUM0L9c5ZNv5bDP9SpY_i-_pZJnotMAmGeMOsdCAVpLNATNZiIoQgaxCKVRhrUmFVkZQhSozUnR1SctcpUKSMnLI4HJXhzrGYGx5Ct234VyCKHtIZQ-p7CGVV0hd5v2SOVHUdLCBvHbxFkwxU7Ij1fneLj5njLnJqDp1DPIfsupwkg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Spectral Magnitude Minimum Mean-Square Error Estimation Using Binary and Continuous Gain Functions</title><source>IEEE Electronic Library (IEL)</source><creator>Jensen, J. ; Hendriks, R. C.</creator><creatorcontrib>Jensen, J. ; Hendriks, R. C.</creatorcontrib><description>Recently, binary mask techniques have been proposed as a tool for retrieving a target speech signal from a noisy observation. A binary gain function is applied to time-frequency tiles of the noisy observation in order to suppress noise dominated and retain target dominated time-frequency regions. When implemented using discrete Fourier transform (DFT) techniques, the binary mask techniques can be seen as a special case of the broader class of DFT-based speech enhancement algorithms, for which the applied gain function is not constrained to be binary. In this context, we develop and compare binary mask techniques to state-of-the-art continuous gain techniques. We derive spectral magnitude minimum mean-square error binary gain estimators; the binary gain estimators turn out to be simple functions of the continuous gain estimators. We show that the optimal binary estimators are closely related to a range of existing, heuristically developed, binary gain estimators. The derived binary gain estimators perform better than existing binary gain estimators in simulation experiments with speech signals contaminated by several different noise sources as measured by speech quality and intelligibility measures. However, even the best binary mask method is significantly outperformed by state-of-the-art continuous gain estimators. The instrumental intelligibility results are confirmed in an intelligibility listening test.</description><identifier>ISSN: 1558-7916</identifier><identifier>EISSN: 1558-7924</identifier><identifier>DOI: 10.1109/TASL.2011.2157685</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>Piscataway, NJ: IEEE</publisher><subject>Applied sciences ; Auditory system ; Detection, estimation, filtering, equalization, prediction ; Discrete Fourier transforms ; Exact sciences and technology ; Ideal Binary Mask ; Information, signal and communications theory ; Noise measurement ; Signal and communications theory ; Signal processing ; Signal to noise ratio ; Signal, noise ; Spectral Magnitude Estimation ; Speech ; Speech Enhancement ; Speech Intelligibility ; Speech processing ; Speech Quality ; Telecommunications and information theory ; Time frequency analysis</subject><ispartof>IEEE transactions on audio, speech, and language processing, 2012-01, Vol.20 (1), p.92-102</ispartof><rights>2015 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c295t-85b559c15f3af6154390da551af753079ffe20c7e0ad574e30201ac367203a7e3</citedby><cites>FETCH-LOGICAL-c295t-85b559c15f3af6154390da551af753079ffe20c7e0ad574e30201ac367203a7e3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5773481$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,4024,27923,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5773481$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=25473436$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Jensen, J.</creatorcontrib><creatorcontrib>Hendriks, R. C.</creatorcontrib><title>Spectral Magnitude Minimum Mean-Square Error Estimation Using Binary and Continuous Gain Functions</title><title>IEEE transactions on audio, speech, and language processing</title><addtitle>TASL</addtitle><description>Recently, binary mask techniques have been proposed as a tool for retrieving a target speech signal from a noisy observation. A binary gain function is applied to time-frequency tiles of the noisy observation in order to suppress noise dominated and retain target dominated time-frequency regions. When implemented using discrete Fourier transform (DFT) techniques, the binary mask techniques can be seen as a special case of the broader class of DFT-based speech enhancement algorithms, for which the applied gain function is not constrained to be binary. In this context, we develop and compare binary mask techniques to state-of-the-art continuous gain techniques. We derive spectral magnitude minimum mean-square error binary gain estimators; the binary gain estimators turn out to be simple functions of the continuous gain estimators. We show that the optimal binary estimators are closely related to a range of existing, heuristically developed, binary gain estimators. The derived binary gain estimators perform better than existing binary gain estimators in simulation experiments with speech signals contaminated by several different noise sources as measured by speech quality and intelligibility measures. However, even the best binary mask method is significantly outperformed by state-of-the-art continuous gain estimators. The instrumental intelligibility results are confirmed in an intelligibility listening test.</description><subject>Applied sciences</subject><subject>Auditory system</subject><subject>Detection, estimation, filtering, equalization, prediction</subject><subject>Discrete Fourier transforms</subject><subject>Exact sciences and technology</subject><subject>Ideal Binary Mask</subject><subject>Information, signal and communications theory</subject><subject>Noise measurement</subject><subject>Signal and communications theory</subject><subject>Signal processing</subject><subject>Signal to noise ratio</subject><subject>Signal, noise</subject><subject>Spectral Magnitude Estimation</subject><subject>Speech</subject><subject>Speech Enhancement</subject><subject>Speech Intelligibility</subject><subject>Speech processing</subject><subject>Speech Quality</subject><subject>Telecommunications and information theory</subject><subject>Time frequency analysis</subject><issn>1558-7916</issn><issn>1558-7924</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kDFvwjAQha2qlUppf0DVxUvHUF-ci8lIEdBKoA7AHB2OjVyBQ-1k4N83EYjpTnrvne59jL2CGAGI4mMzWS9HqQAYpYAqH-MdGwDiOFFFmt3fdsgf2VOMv0JkMs9gwHbrk9FNoANf0d67pq0MXznvju2Rrwz5ZP3XUjB8FkId-Cw27kiNqz3fRuf3_NN5CmdOvuLT2jfOt3Ub-YKc5_PW694Zn9mDpUM0L9c5ZNv5bDP9SpY_i-_pZJnotMAmGeMOsdCAVpLNATNZiIoQgaxCKVRhrUmFVkZQhSozUnR1SctcpUKSMnLI4HJXhzrGYGx5Ct234VyCKHtIZQ-p7CGVV0hd5v2SOVHUdLCBvHbxFkwxU7Ij1fneLj5njLnJqDp1DPIfsupwkg</recordid><startdate>201201</startdate><enddate>201201</enddate><creator>Jensen, J.</creator><creator>Hendriks, R. C.</creator><general>IEEE</general><general>Institute of Electrical and Electronics Engineers</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>201201</creationdate><title>Spectral Magnitude Minimum Mean-Square Error Estimation Using Binary and Continuous Gain Functions</title><author>Jensen, J. ; Hendriks, R. C.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-85b559c15f3af6154390da551af753079ffe20c7e0ad574e30201ac367203a7e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Applied sciences</topic><topic>Auditory system</topic><topic>Detection, estimation, filtering, equalization, prediction</topic><topic>Discrete Fourier transforms</topic><topic>Exact sciences and technology</topic><topic>Ideal Binary Mask</topic><topic>Information, signal and communications theory</topic><topic>Noise measurement</topic><topic>Signal and communications theory</topic><topic>Signal processing</topic><topic>Signal to noise ratio</topic><topic>Signal, noise</topic><topic>Spectral Magnitude Estimation</topic><topic>Speech</topic><topic>Speech Enhancement</topic><topic>Speech Intelligibility</topic><topic>Speech processing</topic><topic>Speech Quality</topic><topic>Telecommunications and information theory</topic><topic>Time frequency analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jensen, J.</creatorcontrib><creatorcontrib>Hendriks, R. C.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><jtitle>IEEE transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jensen, J.</au><au>Hendriks, R. C.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Spectral Magnitude Minimum Mean-Square Error Estimation Using Binary and Continuous Gain Functions</atitle><jtitle>IEEE transactions on audio, speech, and language processing</jtitle><stitle>TASL</stitle><date>2012-01</date><risdate>2012</risdate><volume>20</volume><issue>1</issue><spage>92</spage><epage>102</epage><pages>92-102</pages><issn>1558-7916</issn><eissn>1558-7924</eissn><coden>ITASD8</coden><abstract>Recently, binary mask techniques have been proposed as a tool for retrieving a target speech signal from a noisy observation. A binary gain function is applied to time-frequency tiles of the noisy observation in order to suppress noise dominated and retain target dominated time-frequency regions. When implemented using discrete Fourier transform (DFT) techniques, the binary mask techniques can be seen as a special case of the broader class of DFT-based speech enhancement algorithms, for which the applied gain function is not constrained to be binary. In this context, we develop and compare binary mask techniques to state-of-the-art continuous gain techniques. We derive spectral magnitude minimum mean-square error binary gain estimators; the binary gain estimators turn out to be simple functions of the continuous gain estimators. We show that the optimal binary estimators are closely related to a range of existing, heuristically developed, binary gain estimators. The derived binary gain estimators perform better than existing binary gain estimators in simulation experiments with speech signals contaminated by several different noise sources as measured by speech quality and intelligibility measures. However, even the best binary mask method is significantly outperformed by state-of-the-art continuous gain estimators. The instrumental intelligibility results are confirmed in an intelligibility listening test.</abstract><cop>Piscataway, NJ</cop><pub>IEEE</pub><doi>10.1109/TASL.2011.2157685</doi><tpages>11</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1558-7916
ispartof IEEE transactions on audio, speech, and language processing, 2012-01, Vol.20 (1), p.92-102
issn 1558-7916
1558-7924
language eng
recordid cdi_pascalfrancis_primary_25473436
source IEEE Electronic Library (IEL)
subjects Applied sciences
Auditory system
Detection, estimation, filtering, equalization, prediction
Discrete Fourier transforms
Exact sciences and technology
Ideal Binary Mask
Information, signal and communications theory
Noise measurement
Signal and communications theory
Signal processing
Signal to noise ratio
Signal, noise
Spectral Magnitude Estimation
Speech
Speech Enhancement
Speech Intelligibility
Speech processing
Speech Quality
Telecommunications and information theory
Time frequency analysis
title Spectral Magnitude Minimum Mean-Square Error Estimation Using Binary and Continuous Gain Functions
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T21%3A55%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-pascalfrancis_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Spectral%20Magnitude%20Minimum%20Mean-Square%20Error%20Estimation%20Using%20Binary%20and%20Continuous%20Gain%20Functions&rft.jtitle=IEEE%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Jensen,%20J.&rft.date=2012-01&rft.volume=20&rft.issue=1&rft.spage=92&rft.epage=102&rft.pages=92-102&rft.issn=1558-7916&rft.eissn=1558-7924&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASL.2011.2157685&rft_dat=%3Cpascalfrancis_RIE%3E25473436%3C/pascalfrancis_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5773481&rfr_iscdi=true