Single Channel Speech Separation using Minimum Mean Square Error Estimation of Sources' Log Spectra

We present an approach for separating two speech signals when only one single recording of their linear mixture is available. The log spectra of the sources are estimated from the mixture's log spectrum using minimum mean square error (MMSE) approach. The estimation is obtained from the assumpt...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Radfar, M.H., Dansereau, R.M.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 132
container_issue
container_start_page 128
container_title
container_volume
creator Radfar, M.H.
Dansereau, R.M.
description We present an approach for separating two speech signals when only one single recording of their linear mixture is available. The log spectra of the sources are estimated from the mixture's log spectrum using minimum mean square error (MMSE) approach. The estimation is obtained from the assumption that the sources are modelled using a set of Gaussian subsources which are related to the mixture using MIXMAX approximation. The resulting estimator has a closed form and is expressed using the mean and variance of Gaussian subsources. In order to obtain the two most likely subsources which generate the mixture, we use the estimation-detection technique. We also show that the binary mask filtering which has been empirically - and with no mathematical justification - used in speech separation techniques is, in fact, a simplified form of the MMSE estimator. The proposed technique is compared with the binary mask when the input consists of male-male, female-female, and female-male mixtures. The experimental results in terms of segmental SNR show that the MMSE estimator outperforms binary mask filtering.
doi_str_mv 10.1109/MLSP.2007.4414294
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_4414294</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4414294</ieee_id><sourcerecordid>4414294</sourcerecordid><originalsourceid>FETCH-LOGICAL-i90t-fd8d2bb8b9fb2ec13a1c4ac897786c714e1532c6440a1148f9ad9f1d69d4deaa3</originalsourceid><addsrcrecordid>eNo1UD1PwzAUNF8SpfQHIBZvTAl-jhPbI6rKh5QKpHRgq16cl9aoTYqTDPx7glqmu9OdTqdj7A5EDCDs4zIvPmIphI6VAiWtOmMzq81IR51mWXLOJjLRJrLSfF6wm38jtZdsAmkKkUwVXLNZ130JIUBnoysmzBW-2eyIz7fYNLTjxYHIbXlBBwzY-7bhQzcm-NI3fj_s-ZKw4cX3gIH4IoQ28EXX-_0x2ta8aIfgqHvgebv5K3N9wFt2VeOuo9kJp2z1vFjNX6P8_eVt_pRH3oo-qitTybI0pa1LSQ4SBKfQGau1yZwGRZAm0mVKCQRQprZY2RqqzFaqIsRkyu6PtZ6I1ocwrgo_69NdyS_4AFr1</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Single Channel Speech Separation using Minimum Mean Square Error Estimation of Sources' Log Spectra</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Radfar, M.H. ; Dansereau, R.M.</creator><creatorcontrib>Radfar, M.H. ; Dansereau, R.M.</creatorcontrib><description>We present an approach for separating two speech signals when only one single recording of their linear mixture is available. The log spectra of the sources are estimated from the mixture's log spectrum using minimum mean square error (MMSE) approach. The estimation is obtained from the assumption that the sources are modelled using a set of Gaussian subsources which are related to the mixture using MIXMAX approximation. The resulting estimator has a closed form and is expressed using the mean and variance of Gaussian subsources. In order to obtain the two most likely subsources which generate the mixture, we use the estimation-detection technique. We also show that the binary mask filtering which has been empirically - and with no mathematical justification - used in speech separation techniques is, in fact, a simplified form of the MMSE estimator. The proposed technique is compared with the binary mask when the input consists of male-male, female-female, and female-male mixtures. The experimental results in terms of segmental SNR show that the MMSE estimator outperforms binary mask filtering.</description><identifier>ISSN: 1551-2541</identifier><identifier>ISBN: 1424415659</identifier><identifier>ISBN: 9781424415656</identifier><identifier>EISSN: 2378-928X</identifier><identifier>EISBN: 9781424415663</identifier><identifier>EISBN: 1424415667</identifier><identifier>DOI: 10.1109/MLSP.2007.4414294</identifier><language>eng</language><publisher>IEEE</publisher><subject>Estimation error ; Filtering ; Filters ; Mean square error methods ; Probability density function ; Source separation ; Speech coding ; Speech processing ; State estimation ; Systems engineering and theory</subject><ispartof>2007 IEEE Workshop on Machine Learning for Signal Processing, 2007, p.128-132</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4414294$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4414294$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Radfar, M.H.</creatorcontrib><creatorcontrib>Dansereau, R.M.</creatorcontrib><title>Single Channel Speech Separation using Minimum Mean Square Error Estimation of Sources' Log Spectra</title><title>2007 IEEE Workshop on Machine Learning for Signal Processing</title><addtitle>MLSP</addtitle><description>We present an approach for separating two speech signals when only one single recording of their linear mixture is available. The log spectra of the sources are estimated from the mixture's log spectrum using minimum mean square error (MMSE) approach. The estimation is obtained from the assumption that the sources are modelled using a set of Gaussian subsources which are related to the mixture using MIXMAX approximation. The resulting estimator has a closed form and is expressed using the mean and variance of Gaussian subsources. In order to obtain the two most likely subsources which generate the mixture, we use the estimation-detection technique. We also show that the binary mask filtering which has been empirically - and with no mathematical justification - used in speech separation techniques is, in fact, a simplified form of the MMSE estimator. The proposed technique is compared with the binary mask when the input consists of male-male, female-female, and female-male mixtures. The experimental results in terms of segmental SNR show that the MMSE estimator outperforms binary mask filtering.</description><subject>Estimation error</subject><subject>Filtering</subject><subject>Filters</subject><subject>Mean square error methods</subject><subject>Probability density function</subject><subject>Source separation</subject><subject>Speech coding</subject><subject>Speech processing</subject><subject>State estimation</subject><subject>Systems engineering and theory</subject><issn>1551-2541</issn><issn>2378-928X</issn><isbn>1424415659</isbn><isbn>9781424415656</isbn><isbn>9781424415663</isbn><isbn>1424415667</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2007</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNo1UD1PwzAUNF8SpfQHIBZvTAl-jhPbI6rKh5QKpHRgq16cl9aoTYqTDPx7glqmu9OdTqdj7A5EDCDs4zIvPmIphI6VAiWtOmMzq81IR51mWXLOJjLRJrLSfF6wm38jtZdsAmkKkUwVXLNZ130JIUBnoysmzBW-2eyIz7fYNLTjxYHIbXlBBwzY-7bhQzcm-NI3fj_s-ZKw4cX3gIH4IoQ28EXX-_0x2ta8aIfgqHvgebv5K3N9wFt2VeOuo9kJp2z1vFjNX6P8_eVt_pRH3oo-qitTybI0pa1LSQ4SBKfQGau1yZwGRZAm0mVKCQRQprZY2RqqzFaqIsRkyu6PtZ6I1ocwrgo_69NdyS_4AFr1</recordid><startdate>200708</startdate><enddate>200708</enddate><creator>Radfar, M.H.</creator><creator>Dansereau, R.M.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>200708</creationdate><title>Single Channel Speech Separation using Minimum Mean Square Error Estimation of Sources' Log Spectra</title><author>Radfar, M.H. ; Dansereau, R.M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i90t-fd8d2bb8b9fb2ec13a1c4ac897786c714e1532c6440a1148f9ad9f1d69d4deaa3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2007</creationdate><topic>Estimation error</topic><topic>Filtering</topic><topic>Filters</topic><topic>Mean square error methods</topic><topic>Probability density function</topic><topic>Source separation</topic><topic>Speech coding</topic><topic>Speech processing</topic><topic>State estimation</topic><topic>Systems engineering and theory</topic><toplevel>online_resources</toplevel><creatorcontrib>Radfar, M.H.</creatorcontrib><creatorcontrib>Dansereau, R.M.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Radfar, M.H.</au><au>Dansereau, R.M.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Single Channel Speech Separation using Minimum Mean Square Error Estimation of Sources' Log Spectra</atitle><btitle>2007 IEEE Workshop on Machine Learning for Signal Processing</btitle><stitle>MLSP</stitle><date>2007-08</date><risdate>2007</risdate><spage>128</spage><epage>132</epage><pages>128-132</pages><issn>1551-2541</issn><eissn>2378-928X</eissn><isbn>1424415659</isbn><isbn>9781424415656</isbn><eisbn>9781424415663</eisbn><eisbn>1424415667</eisbn><abstract>We present an approach for separating two speech signals when only one single recording of their linear mixture is available. The log spectra of the sources are estimated from the mixture's log spectrum using minimum mean square error (MMSE) approach. The estimation is obtained from the assumption that the sources are modelled using a set of Gaussian subsources which are related to the mixture using MIXMAX approximation. The resulting estimator has a closed form and is expressed using the mean and variance of Gaussian subsources. In order to obtain the two most likely subsources which generate the mixture, we use the estimation-detection technique. We also show that the binary mask filtering which has been empirically - and with no mathematical justification - used in speech separation techniques is, in fact, a simplified form of the MMSE estimator. The proposed technique is compared with the binary mask when the input consists of male-male, female-female, and female-male mixtures. The experimental results in terms of segmental SNR show that the MMSE estimator outperforms binary mask filtering.</abstract><pub>IEEE</pub><doi>10.1109/MLSP.2007.4414294</doi><tpages>5</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1551-2541
ispartof 2007 IEEE Workshop on Machine Learning for Signal Processing, 2007, p.128-132
issn 1551-2541
2378-928X
language eng
recordid cdi_ieee_primary_4414294
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Estimation error
Filtering
Filters
Mean square error methods
Probability density function
Source separation
Speech coding
Speech processing
State estimation
Systems engineering and theory
title Single Channel Speech Separation using Minimum Mean Square Error Estimation of Sources' Log Spectra
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T05%3A01%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Single%20Channel%20Speech%20Separation%20using%20Minimum%20Mean%20Square%20Error%20Estimation%20of%20Sources'%20Log%20Spectra&rft.btitle=2007%20IEEE%20Workshop%20on%20Machine%20Learning%20for%20Signal%20Processing&rft.au=Radfar,%20M.H.&rft.date=2007-08&rft.spage=128&rft.epage=132&rft.pages=128-132&rft.issn=1551-2541&rft.eissn=2378-928X&rft.isbn=1424415659&rft.isbn_list=9781424415656&rft_id=info:doi/10.1109/MLSP.2007.4414294&rft_dat=%3Cieee_6IE%3E4414294%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781424415663&rft.eisbn_list=1424415667&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=4414294&rfr_iscdi=true