Single Channel Speech Separation using Minimum Mean Square Error Estimation of Sources' Log Spectra
We present an approach for separating two speech signals when only one single recording of their linear mixture is available. The log spectra of the sources are estimated from the mixture's log spectrum using minimum mean square error (MMSE) approach. The estimation is obtained from the assumpt...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 132 |
---|---|
container_issue | |
container_start_page | 128 |
container_title | |
container_volume | |
creator | Radfar, M.H. Dansereau, R.M. |
description | We present an approach for separating two speech signals when only one single recording of their linear mixture is available. The log spectra of the sources are estimated from the mixture's log spectrum using minimum mean square error (MMSE) approach. The estimation is obtained from the assumption that the sources are modelled using a set of Gaussian subsources which are related to the mixture using MIXMAX approximation. The resulting estimator has a closed form and is expressed using the mean and variance of Gaussian subsources. In order to obtain the two most likely subsources which generate the mixture, we use the estimation-detection technique. We also show that the binary mask filtering which has been empirically - and with no mathematical justification - used in speech separation techniques is, in fact, a simplified form of the MMSE estimator. The proposed technique is compared with the binary mask when the input consists of male-male, female-female, and female-male mixtures. The experimental results in terms of segmental SNR show that the MMSE estimator outperforms binary mask filtering. |
doi_str_mv | 10.1109/MLSP.2007.4414294 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_4414294</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4414294</ieee_id><sourcerecordid>4414294</sourcerecordid><originalsourceid>FETCH-LOGICAL-i90t-fd8d2bb8b9fb2ec13a1c4ac897786c714e1532c6440a1148f9ad9f1d69d4deaa3</originalsourceid><addsrcrecordid>eNo1UD1PwzAUNF8SpfQHIBZvTAl-jhPbI6rKh5QKpHRgq16cl9aoTYqTDPx7glqmu9OdTqdj7A5EDCDs4zIvPmIphI6VAiWtOmMzq81IR51mWXLOJjLRJrLSfF6wm38jtZdsAmkKkUwVXLNZ130JIUBnoysmzBW-2eyIz7fYNLTjxYHIbXlBBwzY-7bhQzcm-NI3fj_s-ZKw4cX3gIH4IoQ28EXX-_0x2ta8aIfgqHvgebv5K3N9wFt2VeOuo9kJp2z1vFjNX6P8_eVt_pRH3oo-qitTybI0pa1LSQ4SBKfQGau1yZwGRZAm0mVKCQRQprZY2RqqzFaqIsRkyu6PtZ6I1ocwrgo_69NdyS_4AFr1</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Single Channel Speech Separation using Minimum Mean Square Error Estimation of Sources' Log Spectra</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Radfar, M.H. ; Dansereau, R.M.</creator><creatorcontrib>Radfar, M.H. ; Dansereau, R.M.</creatorcontrib><description>We present an approach for separating two speech signals when only one single recording of their linear mixture is available. The log spectra of the sources are estimated from the mixture's log spectrum using minimum mean square error (MMSE) approach. The estimation is obtained from the assumption that the sources are modelled using a set of Gaussian subsources which are related to the mixture using MIXMAX approximation. The resulting estimator has a closed form and is expressed using the mean and variance of Gaussian subsources. In order to obtain the two most likely subsources which generate the mixture, we use the estimation-detection technique. We also show that the binary mask filtering which has been empirically - and with no mathematical justification - used in speech separation techniques is, in fact, a simplified form of the MMSE estimator. The proposed technique is compared with the binary mask when the input consists of male-male, female-female, and female-male mixtures. The experimental results in terms of segmental SNR show that the MMSE estimator outperforms binary mask filtering.</description><identifier>ISSN: 1551-2541</identifier><identifier>ISBN: 1424415659</identifier><identifier>ISBN: 9781424415656</identifier><identifier>EISSN: 2378-928X</identifier><identifier>EISBN: 9781424415663</identifier><identifier>EISBN: 1424415667</identifier><identifier>DOI: 10.1109/MLSP.2007.4414294</identifier><language>eng</language><publisher>IEEE</publisher><subject>Estimation error ; Filtering ; Filters ; Mean square error methods ; Probability density function ; Source separation ; Speech coding ; Speech processing ; State estimation ; Systems engineering and theory</subject><ispartof>2007 IEEE Workshop on Machine Learning for Signal Processing, 2007, p.128-132</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4414294$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4414294$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Radfar, M.H.</creatorcontrib><creatorcontrib>Dansereau, R.M.</creatorcontrib><title>Single Channel Speech Separation using Minimum Mean Square Error Estimation of Sources' Log Spectra</title><title>2007 IEEE Workshop on Machine Learning for Signal Processing</title><addtitle>MLSP</addtitle><description>We present an approach for separating two speech signals when only one single recording of their linear mixture is available. The log spectra of the sources are estimated from the mixture's log spectrum using minimum mean square error (MMSE) approach. The estimation is obtained from the assumption that the sources are modelled using a set of Gaussian subsources which are related to the mixture using MIXMAX approximation. The resulting estimator has a closed form and is expressed using the mean and variance of Gaussian subsources. In order to obtain the two most likely subsources which generate the mixture, we use the estimation-detection technique. We also show that the binary mask filtering which has been empirically - and with no mathematical justification - used in speech separation techniques is, in fact, a simplified form of the MMSE estimator. The proposed technique is compared with the binary mask when the input consists of male-male, female-female, and female-male mixtures. The experimental results in terms of segmental SNR show that the MMSE estimator outperforms binary mask filtering.</description><subject>Estimation error</subject><subject>Filtering</subject><subject>Filters</subject><subject>Mean square error methods</subject><subject>Probability density function</subject><subject>Source separation</subject><subject>Speech coding</subject><subject>Speech processing</subject><subject>State estimation</subject><subject>Systems engineering and theory</subject><issn>1551-2541</issn><issn>2378-928X</issn><isbn>1424415659</isbn><isbn>9781424415656</isbn><isbn>9781424415663</isbn><isbn>1424415667</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2007</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNo1UD1PwzAUNF8SpfQHIBZvTAl-jhPbI6rKh5QKpHRgq16cl9aoTYqTDPx7glqmu9OdTqdj7A5EDCDs4zIvPmIphI6VAiWtOmMzq81IR51mWXLOJjLRJrLSfF6wm38jtZdsAmkKkUwVXLNZ130JIUBnoysmzBW-2eyIz7fYNLTjxYHIbXlBBwzY-7bhQzcm-NI3fj_s-ZKw4cX3gIH4IoQ28EXX-_0x2ta8aIfgqHvgebv5K3N9wFt2VeOuo9kJp2z1vFjNX6P8_eVt_pRH3oo-qitTybI0pa1LSQ4SBKfQGau1yZwGRZAm0mVKCQRQprZY2RqqzFaqIsRkyu6PtZ6I1ocwrgo_69NdyS_4AFr1</recordid><startdate>200708</startdate><enddate>200708</enddate><creator>Radfar, M.H.</creator><creator>Dansereau, R.M.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>200708</creationdate><title>Single Channel Speech Separation using Minimum Mean Square Error Estimation of Sources' Log Spectra</title><author>Radfar, M.H. ; Dansereau, R.M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i90t-fd8d2bb8b9fb2ec13a1c4ac897786c714e1532c6440a1148f9ad9f1d69d4deaa3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2007</creationdate><topic>Estimation error</topic><topic>Filtering</topic><topic>Filters</topic><topic>Mean square error methods</topic><topic>Probability density function</topic><topic>Source separation</topic><topic>Speech coding</topic><topic>Speech processing</topic><topic>State estimation</topic><topic>Systems engineering and theory</topic><toplevel>online_resources</toplevel><creatorcontrib>Radfar, M.H.</creatorcontrib><creatorcontrib>Dansereau, R.M.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Radfar, M.H.</au><au>Dansereau, R.M.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Single Channel Speech Separation using Minimum Mean Square Error Estimation of Sources' Log Spectra</atitle><btitle>2007 IEEE Workshop on Machine Learning for Signal Processing</btitle><stitle>MLSP</stitle><date>2007-08</date><risdate>2007</risdate><spage>128</spage><epage>132</epage><pages>128-132</pages><issn>1551-2541</issn><eissn>2378-928X</eissn><isbn>1424415659</isbn><isbn>9781424415656</isbn><eisbn>9781424415663</eisbn><eisbn>1424415667</eisbn><abstract>We present an approach for separating two speech signals when only one single recording of their linear mixture is available. The log spectra of the sources are estimated from the mixture's log spectrum using minimum mean square error (MMSE) approach. The estimation is obtained from the assumption that the sources are modelled using a set of Gaussian subsources which are related to the mixture using MIXMAX approximation. The resulting estimator has a closed form and is expressed using the mean and variance of Gaussian subsources. In order to obtain the two most likely subsources which generate the mixture, we use the estimation-detection technique. We also show that the binary mask filtering which has been empirically - and with no mathematical justification - used in speech separation techniques is, in fact, a simplified form of the MMSE estimator. The proposed technique is compared with the binary mask when the input consists of male-male, female-female, and female-male mixtures. The experimental results in terms of segmental SNR show that the MMSE estimator outperforms binary mask filtering.</abstract><pub>IEEE</pub><doi>10.1109/MLSP.2007.4414294</doi><tpages>5</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1551-2541 |
ispartof | 2007 IEEE Workshop on Machine Learning for Signal Processing, 2007, p.128-132 |
issn | 1551-2541 2378-928X |
language | eng |
recordid | cdi_ieee_primary_4414294 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Estimation error Filtering Filters Mean square error methods Probability density function Source separation Speech coding Speech processing State estimation Systems engineering and theory |
title | Single Channel Speech Separation using Minimum Mean Square Error Estimation of Sources' Log Spectra |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T05%3A01%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Single%20Channel%20Speech%20Separation%20using%20Minimum%20Mean%20Square%20Error%20Estimation%20of%20Sources'%20Log%20Spectra&rft.btitle=2007%20IEEE%20Workshop%20on%20Machine%20Learning%20for%20Signal%20Processing&rft.au=Radfar,%20M.H.&rft.date=2007-08&rft.spage=128&rft.epage=132&rft.pages=128-132&rft.issn=1551-2541&rft.eissn=2378-928X&rft.isbn=1424415659&rft.isbn_list=9781424415656&rft_id=info:doi/10.1109/MLSP.2007.4414294&rft_dat=%3Cieee_6IE%3E4414294%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781424415663&rft.eisbn_list=1424415667&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=4414294&rfr_iscdi=true |