An Unsupervised Approach to Cochannel Speech Separation

Cochannel (two-talker) speech separation is predominantly addressed using pretrained speaker dependent models. In this paper, we propose an unsupervised approach to separating cochannel speech. Our approach follows the two main stages of computational auditory scene analysis: segmentation and groupi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on audio, speech, and language processing speech, and language processing, 2013-01, Vol.21 (1), p.122-131
Hauptverfasser: Hu, Ke, Wang, DeLiang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 131
container_issue 1
container_start_page 122
container_title IEEE transactions on audio, speech, and language processing
container_volume 21
creator Hu, Ke
Wang, DeLiang
description Cochannel (two-talker) speech separation is predominantly addressed using pretrained speaker dependent models. In this paper, we propose an unsupervised approach to separating cochannel speech. Our approach follows the two main stages of computational auditory scene analysis: segmentation and grouping. For voiced speech segregation, the proposed system utilizes a tandem algorithm for simultaneous grouping and then unsupervised clustering for sequential grouping. The clustering is performed by a search to maximize the ratio of between- and within-group speaker distances while penalizing within-group concurrent pitches. To segregate unvoiced speech, we first produce unvoiced speech segments based on onset/offset analysis. The segments are grouped using the complementary binary masks of segregated voiced speech. Despite its simplicity, our approach produces significant SNR improvements across a range of input SNR. The proposed system yields competitive performance in comparison to other speaker-independent and model-based methods.
doi_str_mv 10.1109/TASL.2012.2215591
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_6303834</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6303834</ieee_id><sourcerecordid>2797714011</sourcerecordid><originalsourceid>FETCH-LOGICAL-c399t-e3ea29ae5571fdb803c25085e68b88619f20747612b298f6eed1bb97fe4cfe383</originalsourceid><addsrcrecordid>eNpdkE9rwzAMxc3YYF23DzB2CYzBLuksO3bsYyj7B4Ud2p6N4yo0JU0yOxns28-lpYednpB-ekiPkHugMwCqX1bFcjFjFNiMMRBCwwWZRFVprll2ea5BXpObEHaUZlxmMCF50SbrNow9-p864CYp-t531m2ToUvmndvatsUmWfaIsbfE3no71F17S64q2wS8O-mUrN9eV_OPdPH1_jkvFqnjWg8pcrRMWxQih2pTKsodE1QJlKpUSoKuGM2zXAIrmVaVRNxAWeq8wsxVyBWfkuejb7zqe8QwmH0dHDaNbbEbgwEOQuaUMhnRx3_orht9G68zAJDpTFBGIwVHyvkuBI-V6X29t_7XADWHKM0hSnOI0pyijDtPJ2cbnG0qb1tXh_Mik0pwzkXkHo5cjYjnseQ0PpLxP1lQeoE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1114945020</pqid></control><display><type>article</type><title>An Unsupervised Approach to Cochannel Speech Separation</title><source>IEEE Electronic Library (IEL)</source><creator>Hu, Ke ; Wang, DeLiang</creator><creatorcontrib>Hu, Ke ; Wang, DeLiang</creatorcontrib><description>Cochannel (two-talker) speech separation is predominantly addressed using pretrained speaker dependent models. In this paper, we propose an unsupervised approach to separating cochannel speech. Our approach follows the two main stages of computational auditory scene analysis: segmentation and grouping. For voiced speech segregation, the proposed system utilizes a tandem algorithm for simultaneous grouping and then unsupervised clustering for sequential grouping. The clustering is performed by a search to maximize the ratio of between- and within-group speaker distances while penalizing within-group concurrent pitches. To segregate unvoiced speech, we first produce unvoiced speech segments based on onset/offset analysis. The segments are grouped using the complementary binary masks of segregated voiced speech. Despite its simplicity, our approach produces significant SNR improvements across a range of input SNR. The proposed system yields competitive performance in comparison to other speaker-independent and model-based methods.</description><identifier>ISSN: 1558-7916</identifier><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 1558-7924</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASL.2012.2215591</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>Piscataway, NJ: IEEE</publisher><subject>Algorithm design and analysis ; Applied sciences ; Clustering ; Clustering algorithms ; cochannel speech separation ; Computational auditory scene analysis (CASA) ; Computational modeling ; Exact sciences and technology ; Hidden Markov models ; Image processing ; Information, signal and communications theory ; Mathematical models ; Natural language processing ; Scene analysis ; Segments ; Separation ; sequential grouping ; Signal and communications theory ; Signal processing ; Signal representation. Spectral analysis ; Signal to noise ratio ; Signal, noise ; Speech ; Studies ; Telecommunications and information theory ; Time frequency analysis ; Transaction processing ; unsupervised clustering ; unvoiced speech segregation</subject><ispartof>IEEE transactions on audio, speech, and language processing, 2013-01, Vol.21 (1), p.122-131</ispartof><rights>2014 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Jan 2013</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c399t-e3ea29ae5571fdb803c25085e68b88619f20747612b298f6eed1bb97fe4cfe383</citedby><cites>FETCH-LOGICAL-c399t-e3ea29ae5571fdb803c25085e68b88619f20747612b298f6eed1bb97fe4cfe383</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6303834$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,4010,27900,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6303834$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=26853335$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Hu, Ke</creatorcontrib><creatorcontrib>Wang, DeLiang</creatorcontrib><title>An Unsupervised Approach to Cochannel Speech Separation</title><title>IEEE transactions on audio, speech, and language processing</title><addtitle>TASL</addtitle><description>Cochannel (two-talker) speech separation is predominantly addressed using pretrained speaker dependent models. In this paper, we propose an unsupervised approach to separating cochannel speech. Our approach follows the two main stages of computational auditory scene analysis: segmentation and grouping. For voiced speech segregation, the proposed system utilizes a tandem algorithm for simultaneous grouping and then unsupervised clustering for sequential grouping. The clustering is performed by a search to maximize the ratio of between- and within-group speaker distances while penalizing within-group concurrent pitches. To segregate unvoiced speech, we first produce unvoiced speech segments based on onset/offset analysis. The segments are grouped using the complementary binary masks of segregated voiced speech. Despite its simplicity, our approach produces significant SNR improvements across a range of input SNR. The proposed system yields competitive performance in comparison to other speaker-independent and model-based methods.</description><subject>Algorithm design and analysis</subject><subject>Applied sciences</subject><subject>Clustering</subject><subject>Clustering algorithms</subject><subject>cochannel speech separation</subject><subject>Computational auditory scene analysis (CASA)</subject><subject>Computational modeling</subject><subject>Exact sciences and technology</subject><subject>Hidden Markov models</subject><subject>Image processing</subject><subject>Information, signal and communications theory</subject><subject>Mathematical models</subject><subject>Natural language processing</subject><subject>Scene analysis</subject><subject>Segments</subject><subject>Separation</subject><subject>sequential grouping</subject><subject>Signal and communications theory</subject><subject>Signal processing</subject><subject>Signal representation. Spectral analysis</subject><subject>Signal to noise ratio</subject><subject>Signal, noise</subject><subject>Speech</subject><subject>Studies</subject><subject>Telecommunications and information theory</subject><subject>Time frequency analysis</subject><subject>Transaction processing</subject><subject>unsupervised clustering</subject><subject>unvoiced speech segregation</subject><issn>1558-7916</issn><issn>2329-9290</issn><issn>1558-7924</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkE9rwzAMxc3YYF23DzB2CYzBLuksO3bsYyj7B4Ud2p6N4yo0JU0yOxns28-lpYednpB-ekiPkHugMwCqX1bFcjFjFNiMMRBCwwWZRFVprll2ea5BXpObEHaUZlxmMCF50SbrNow9-p864CYp-t531m2ToUvmndvatsUmWfaIsbfE3no71F17S64q2wS8O-mUrN9eV_OPdPH1_jkvFqnjWg8pcrRMWxQih2pTKsodE1QJlKpUSoKuGM2zXAIrmVaVRNxAWeq8wsxVyBWfkuejb7zqe8QwmH0dHDaNbbEbgwEOQuaUMhnRx3_orht9G68zAJDpTFBGIwVHyvkuBI-V6X29t_7XADWHKM0hSnOI0pyijDtPJ2cbnG0qb1tXh_Mik0pwzkXkHo5cjYjnseQ0PpLxP1lQeoE</recordid><startdate>201301</startdate><enddate>201301</enddate><creator>Hu, Ke</creator><creator>Wang, DeLiang</creator><general>IEEE</general><general>Institute of Electrical and Electronics Engineers</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>201301</creationdate><title>An Unsupervised Approach to Cochannel Speech Separation</title><author>Hu, Ke ; Wang, DeLiang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c399t-e3ea29ae5571fdb803c25085e68b88619f20747612b298f6eed1bb97fe4cfe383</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Algorithm design and analysis</topic><topic>Applied sciences</topic><topic>Clustering</topic><topic>Clustering algorithms</topic><topic>cochannel speech separation</topic><topic>Computational auditory scene analysis (CASA)</topic><topic>Computational modeling</topic><topic>Exact sciences and technology</topic><topic>Hidden Markov models</topic><topic>Image processing</topic><topic>Information, signal and communications theory</topic><topic>Mathematical models</topic><topic>Natural language processing</topic><topic>Scene analysis</topic><topic>Segments</topic><topic>Separation</topic><topic>sequential grouping</topic><topic>Signal and communications theory</topic><topic>Signal processing</topic><topic>Signal representation. Spectral analysis</topic><topic>Signal to noise ratio</topic><topic>Signal, noise</topic><topic>Speech</topic><topic>Studies</topic><topic>Telecommunications and information theory</topic><topic>Time frequency analysis</topic><topic>Transaction processing</topic><topic>unsupervised clustering</topic><topic>unvoiced speech segregation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hu, Ke</creatorcontrib><creatorcontrib>Wang, DeLiang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hu, Ke</au><au>Wang, DeLiang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An Unsupervised Approach to Cochannel Speech Separation</atitle><jtitle>IEEE transactions on audio, speech, and language processing</jtitle><stitle>TASL</stitle><date>2013-01</date><risdate>2013</risdate><volume>21</volume><issue>1</issue><spage>122</spage><epage>131</epage><pages>122-131</pages><issn>1558-7916</issn><issn>2329-9290</issn><eissn>1558-7924</eissn><eissn>2329-9304</eissn><coden>ITASD8</coden><abstract>Cochannel (two-talker) speech separation is predominantly addressed using pretrained speaker dependent models. In this paper, we propose an unsupervised approach to separating cochannel speech. Our approach follows the two main stages of computational auditory scene analysis: segmentation and grouping. For voiced speech segregation, the proposed system utilizes a tandem algorithm for simultaneous grouping and then unsupervised clustering for sequential grouping. The clustering is performed by a search to maximize the ratio of between- and within-group speaker distances while penalizing within-group concurrent pitches. To segregate unvoiced speech, we first produce unvoiced speech segments based on onset/offset analysis. The segments are grouped using the complementary binary masks of segregated voiced speech. Despite its simplicity, our approach produces significant SNR improvements across a range of input SNR. The proposed system yields competitive performance in comparison to other speaker-independent and model-based methods.</abstract><cop>Piscataway, NJ</cop><pub>IEEE</pub><doi>10.1109/TASL.2012.2215591</doi><tpages>10</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1558-7916
ispartof IEEE transactions on audio, speech, and language processing, 2013-01, Vol.21 (1), p.122-131
issn 1558-7916
2329-9290
1558-7924
2329-9304
language eng
recordid cdi_ieee_primary_6303834
source IEEE Electronic Library (IEL)
subjects Algorithm design and analysis
Applied sciences
Clustering
Clustering algorithms
cochannel speech separation
Computational auditory scene analysis (CASA)
Computational modeling
Exact sciences and technology
Hidden Markov models
Image processing
Information, signal and communications theory
Mathematical models
Natural language processing
Scene analysis
Segments
Separation
sequential grouping
Signal and communications theory
Signal processing
Signal representation. Spectral analysis
Signal to noise ratio
Signal, noise
Speech
Studies
Telecommunications and information theory
Time frequency analysis
Transaction processing
unsupervised clustering
unvoiced speech segregation
title An Unsupervised Approach to Cochannel Speech Separation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T20%3A29%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20Unsupervised%20Approach%20to%20Cochannel%20Speech%20Separation&rft.jtitle=IEEE%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Hu,%20Ke&rft.date=2013-01&rft.volume=21&rft.issue=1&rft.spage=122&rft.epage=131&rft.pages=122-131&rft.issn=1558-7916&rft.eissn=1558-7924&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASL.2012.2215591&rft_dat=%3Cproquest_RIE%3E2797714011%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1114945020&rft_id=info:pmid/&rft_ieee_id=6303834&rfr_iscdi=true