A hybrid model for unsupervised single channel speech separation

The performance of any voice recognition platform in real environment depends on how well the desired speech signal is separated from unwanted signals like background noise or background speakers. In this paper, we propose a three stage hybrid model to separate two speakers from single channel speec...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Multimedia tools and applications 2024-02, Vol.83 (5), p.13241-13259
Hauptverfasser:	Prasanna Kumar, MK, Kumaraswamy, R.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Background noise Computer Communication Networks Computer Science Data Structures and Information Theory Masking Mathematical models Mixtures Multimedia Information Systems Segmentation Separation Special Purpose and Application-Based Systems Speech Voice recognition
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	13259
container_issue	5
container_start_page	13241
container_title	Multimedia tools and applications
container_volume	83
creator	Prasanna Kumar, MK Kumaraswamy, R.
description	The performance of any voice recognition platform in real environment depends on how well the desired speech signal is separated from unwanted signals like background noise or background speakers. In this paper, we propose a three stage hybrid model to separate two speakers from single channel speech mixture under unsupervised condition. Proposed method combines three techniques namely speech segmentation, NMF (Nonnegative Matrix Factorization) and Masking. Speech segmentation groups the short speech frames belonging to individual speakers by identifying the speaker change over points. The segmentation block groups the speech frames belonging to individual speakers but lacks in continuity of the speech samples. Therefore a second stage is built using NMF. NMF algorithm performs better in separating the speech mixture when parts of the individual speech signals are known a priori. This requirement is satisfied by speech segmentation stage. NMF further separates the individual speech signals in the mixture by maintaining continuity of speech samples over time. To further improve the accuracy of separated speech signals, various masking methods like TFR (Time frequency Ratio), SM (Soft Mask) and HM (Hard Mask) are applied. The separation results are compared with other unsupervised algorithms. The proposed hybrid model produces promising results in unsupervised single channel speech separation. This model can be applied at the front end of any voice recognition platform to further improve the recognition efficiency.
doi_str_mv	10.1007/s11042-023-16108-z
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2918767332</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2918767332</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-81cf91836f146237f948a4cf3970bdbe60c6997b0358d7297bc613210447c43f3</originalsourceid><addsrcrecordid>eNp9kE9LAzEQxYMoWKtfwFPAczSTpMnuzVL8B4IXPYfdbNJuabNrpiu0n97oCnryNA_mvTfMj5BL4NfAublBAK4E40Iy0MALdjgiE5gZyYwRcPxHn5IzxDXnoGdCTcjtnK72dWobuu0av6GhS3SIOPQ-fbToG4ptXG48dasqxrzH3nu3ouj7KlW7tovn5CRUG_QXP3NK3u7vXheP7Pnl4Wkxf2ZOQrljBbhQQiF1AKWFNKFURaVckKXhdVN7zZ0uS1NzOSsaI7JyGqTIXynjlAxySq7G3j5174PHnV13Q4r5pBW52GgjpcguMbpc6hCTD7ZP7bZKewvcfpGyIymbSdlvUvaQQ3IMYTbHpU-_1f-kPgGq6Wrc</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918767332</pqid></control><display><type>article</type><title>A hybrid model for unsupervised single channel speech separation</title><source>SpringerLink Journals - AutoHoldings</source><creator>Prasanna Kumar, MK ; Kumaraswamy, R.</creator><creatorcontrib>Prasanna Kumar, MK ; Kumaraswamy, R.</creatorcontrib><description>The performance of any voice recognition platform in real environment depends on how well the desired speech signal is separated from unwanted signals like background noise or background speakers. In this paper, we propose a three stage hybrid model to separate two speakers from single channel speech mixture under unsupervised condition. Proposed method combines three techniques namely speech segmentation, NMF (Nonnegative Matrix Factorization) and Masking. Speech segmentation groups the short speech frames belonging to individual speakers by identifying the speaker change over points. The segmentation block groups the speech frames belonging to individual speakers but lacks in continuity of the speech samples. Therefore a second stage is built using NMF. NMF algorithm performs better in separating the speech mixture when parts of the individual speech signals are known a priori. This requirement is satisfied by speech segmentation stage. NMF further separates the individual speech signals in the mixture by maintaining continuity of speech samples over time. To further improve the accuracy of separated speech signals, various masking methods like TFR (Time frequency Ratio), SM (Soft Mask) and HM (Hard Mask) are applied. The separation results are compared with other unsupervised algorithms. The proposed hybrid model produces promising results in unsupervised single channel speech separation. This model can be applied at the front end of any voice recognition platform to further improve the recognition efficiency.</description><identifier>ISSN: 1573-7721</identifier><identifier>ISSN: 1380-7501</identifier><identifier>EISSN: 1573-7721</identifier><identifier>DOI: 10.1007/s11042-023-16108-z</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Algorithms ; Background noise ; Computer Communication Networks ; Computer Science ; Data Structures and Information Theory ; Masking ; Mathematical models ; Mixtures ; Multimedia Information Systems ; Segmentation ; Separation ; Special Purpose and Application-Based Systems ; Speech ; Voice recognition</subject><ispartof>Multimedia tools and applications, 2024-02, Vol.83 (5), p.13241-13259</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-81cf91836f146237f948a4cf3970bdbe60c6997b0358d7297bc613210447c43f3</citedby><cites>FETCH-LOGICAL-c319t-81cf91836f146237f948a4cf3970bdbe60c6997b0358d7297bc613210447c43f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11042-023-16108-z$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11042-023-16108-z$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Prasanna Kumar, MK</creatorcontrib><creatorcontrib>Kumaraswamy, R.</creatorcontrib><title>A hybrid model for unsupervised single channel speech separation</title><title>Multimedia tools and applications</title><addtitle>Multimed Tools Appl</addtitle><description>The performance of any voice recognition platform in real environment depends on how well the desired speech signal is separated from unwanted signals like background noise or background speakers. In this paper, we propose a three stage hybrid model to separate two speakers from single channel speech mixture under unsupervised condition. Proposed method combines three techniques namely speech segmentation, NMF (Nonnegative Matrix Factorization) and Masking. Speech segmentation groups the short speech frames belonging to individual speakers by identifying the speaker change over points. The segmentation block groups the speech frames belonging to individual speakers but lacks in continuity of the speech samples. Therefore a second stage is built using NMF. NMF algorithm performs better in separating the speech mixture when parts of the individual speech signals are known a priori. This requirement is satisfied by speech segmentation stage. NMF further separates the individual speech signals in the mixture by maintaining continuity of speech samples over time. To further improve the accuracy of separated speech signals, various masking methods like TFR (Time frequency Ratio), SM (Soft Mask) and HM (Hard Mask) are applied. The separation results are compared with other unsupervised algorithms. The proposed hybrid model produces promising results in unsupervised single channel speech separation. This model can be applied at the front end of any voice recognition platform to further improve the recognition efficiency.</description><subject>Algorithms</subject><subject>Background noise</subject><subject>Computer Communication Networks</subject><subject>Computer Science</subject><subject>Data Structures and Information Theory</subject><subject>Masking</subject><subject>Mathematical models</subject><subject>Mixtures</subject><subject>Multimedia Information Systems</subject><subject>Segmentation</subject><subject>Separation</subject><subject>Special Purpose and Application-Based Systems</subject><subject>Speech</subject><subject>Voice recognition</subject><issn>1573-7721</issn><issn>1380-7501</issn><issn>1573-7721</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kE9LAzEQxYMoWKtfwFPAczSTpMnuzVL8B4IXPYfdbNJuabNrpiu0n97oCnryNA_mvTfMj5BL4NfAublBAK4E40Iy0MALdjgiE5gZyYwRcPxHn5IzxDXnoGdCTcjtnK72dWobuu0av6GhS3SIOPQ-fbToG4ptXG48dasqxrzH3nu3ouj7KlW7tovn5CRUG_QXP3NK3u7vXheP7Pnl4Wkxf2ZOQrljBbhQQiF1AKWFNKFURaVckKXhdVN7zZ0uS1NzOSsaI7JyGqTIXynjlAxySq7G3j5174PHnV13Q4r5pBW52GgjpcguMbpc6hCTD7ZP7bZKewvcfpGyIymbSdlvUvaQQ3IMYTbHpU-_1f-kPgGq6Wrc</recordid><startdate>20240201</startdate><enddate>20240201</enddate><creator>Prasanna Kumar, MK</creator><creator>Kumaraswamy, R.</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20240201</creationdate><title>A hybrid model for unsupervised single channel speech separation</title><author>Prasanna Kumar, MK ; Kumaraswamy, R.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-81cf91836f146237f948a4cf3970bdbe60c6997b0358d7297bc613210447c43f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Background noise</topic><topic>Computer Communication Networks</topic><topic>Computer Science</topic><topic>Data Structures and Information Theory</topic><topic>Masking</topic><topic>Mathematical models</topic><topic>Mixtures</topic><topic>Multimedia Information Systems</topic><topic>Segmentation</topic><topic>Separation</topic><topic>Special Purpose and Application-Based Systems</topic><topic>Speech</topic><topic>Voice recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Prasanna Kumar, MK</creatorcontrib><creatorcontrib>Kumaraswamy, R.</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Multimedia tools and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Prasanna Kumar, MK</au><au>Kumaraswamy, R.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A hybrid model for unsupervised single channel speech separation</atitle><jtitle>Multimedia tools and applications</jtitle><stitle>Multimed Tools Appl</stitle><date>2024-02-01</date><risdate>2024</risdate><volume>83</volume><issue>5</issue><spage>13241</spage><epage>13259</epage><pages>13241-13259</pages><issn>1573-7721</issn><issn>1380-7501</issn><eissn>1573-7721</eissn><abstract>The performance of any voice recognition platform in real environment depends on how well the desired speech signal is separated from unwanted signals like background noise or background speakers. In this paper, we propose a three stage hybrid model to separate two speakers from single channel speech mixture under unsupervised condition. Proposed method combines three techniques namely speech segmentation, NMF (Nonnegative Matrix Factorization) and Masking. Speech segmentation groups the short speech frames belonging to individual speakers by identifying the speaker change over points. The segmentation block groups the speech frames belonging to individual speakers but lacks in continuity of the speech samples. Therefore a second stage is built using NMF. NMF algorithm performs better in separating the speech mixture when parts of the individual speech signals are known a priori. This requirement is satisfied by speech segmentation stage. NMF further separates the individual speech signals in the mixture by maintaining continuity of speech samples over time. To further improve the accuracy of separated speech signals, various masking methods like TFR (Time frequency Ratio), SM (Soft Mask) and HM (Hard Mask) are applied. The separation results are compared with other unsupervised algorithms. The proposed hybrid model produces promising results in unsupervised single channel speech separation. This model can be applied at the front end of any voice recognition platform to further improve the recognition efficiency.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11042-023-16108-z</doi><tpages>19</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1573-7721
ispartof	Multimedia tools and applications, 2024-02, Vol.83 (5), p.13241-13259
issn	1573-7721 1380-7501 1573-7721
language	eng
recordid	cdi_proquest_journals_2918767332
source	SpringerLink Journals - AutoHoldings
subjects	Algorithms Background noise Computer Communication Networks Computer Science Data Structures and Information Theory Masking Mathematical models Mixtures Multimedia Information Systems Segmentation Separation Special Purpose and Application-Based Systems Speech Voice recognition
title	A hybrid model for unsupervised single channel speech separation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T06%3A21%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20hybrid%20model%20for%20unsupervised%20single%20channel%20speech%20separation&rft.jtitle=Multimedia%20tools%20and%20applications&rft.au=Prasanna%20Kumar,%20MK&rft.date=2024-02-01&rft.volume=83&rft.issue=5&rft.spage=13241&rft.epage=13259&rft.pages=13241-13259&rft.issn=1573-7721&rft.eissn=1573-7721&rft_id=info:doi/10.1007/s11042-023-16108-z&rft_dat=%3Cproquest_cross%3E2918767332%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2918767332&rft_id=info:pmid/&rfr_iscdi=true