A hybrid model for unsupervised single channel speech separation

The performance of any voice recognition platform in real environment depends on how well the desired speech signal is separated from unwanted signals like background noise or background speakers. In this paper, we propose a three stage hybrid model to separate two speakers from single channel speec...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Multimedia tools and applications 2024-02, Vol.83 (5), p.13241-13259
Hauptverfasser: Prasanna Kumar, MK, Kumaraswamy, R.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 13259
container_issue 5
container_start_page 13241
container_title Multimedia tools and applications
container_volume 83
creator Prasanna Kumar, MK
Kumaraswamy, R.
description The performance of any voice recognition platform in real environment depends on how well the desired speech signal is separated from unwanted signals like background noise or background speakers. In this paper, we propose a three stage hybrid model to separate two speakers from single channel speech mixture under unsupervised condition. Proposed method combines three techniques namely speech segmentation, NMF (Nonnegative Matrix Factorization) and Masking. Speech segmentation groups the short speech frames belonging to individual speakers by identifying the speaker change over points. The segmentation block groups the speech frames belonging to individual speakers but lacks in continuity of the speech samples. Therefore a second stage is built using NMF. NMF algorithm performs better in separating the speech mixture when parts of the individual speech signals are known a priori. This requirement is satisfied by speech segmentation stage. NMF further separates the individual speech signals in the mixture by maintaining continuity of speech samples over time. To further improve the accuracy of separated speech signals, various masking methods like TFR (Time frequency Ratio), SM (Soft Mask) and HM (Hard Mask) are applied. The separation results are compared with other unsupervised algorithms. The proposed hybrid model produces promising results in unsupervised single channel speech separation. This model can be applied at the front end of any voice recognition platform to further improve the recognition efficiency.
doi_str_mv 10.1007/s11042-023-16108-z
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2918767332</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2918767332</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-81cf91836f146237f948a4cf3970bdbe60c6997b0358d7297bc613210447c43f3</originalsourceid><addsrcrecordid>eNp9kE9LAzEQxYMoWKtfwFPAczSTpMnuzVL8B4IXPYfdbNJuabNrpiu0n97oCnryNA_mvTfMj5BL4NfAublBAK4E40Iy0MALdjgiE5gZyYwRcPxHn5IzxDXnoGdCTcjtnK72dWobuu0av6GhS3SIOPQ-fbToG4ptXG48dasqxrzH3nu3ouj7KlW7tovn5CRUG_QXP3NK3u7vXheP7Pnl4Wkxf2ZOQrljBbhQQiF1AKWFNKFURaVckKXhdVN7zZ0uS1NzOSsaI7JyGqTIXynjlAxySq7G3j5174PHnV13Q4r5pBW52GgjpcguMbpc6hCTD7ZP7bZKewvcfpGyIymbSdlvUvaQQ3IMYTbHpU-_1f-kPgGq6Wrc</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918767332</pqid></control><display><type>article</type><title>A hybrid model for unsupervised single channel speech separation</title><source>SpringerLink Journals - AutoHoldings</source><creator>Prasanna Kumar, MK ; Kumaraswamy, R.</creator><creatorcontrib>Prasanna Kumar, MK ; Kumaraswamy, R.</creatorcontrib><description>The performance of any voice recognition platform in real environment depends on how well the desired speech signal is separated from unwanted signals like background noise or background speakers. In this paper, we propose a three stage hybrid model to separate two speakers from single channel speech mixture under unsupervised condition. Proposed method combines three techniques namely speech segmentation, NMF (Nonnegative Matrix Factorization) and Masking. Speech segmentation groups the short speech frames belonging to individual speakers by identifying the speaker change over points. The segmentation block groups the speech frames belonging to individual speakers but lacks in continuity of the speech samples. Therefore a second stage is built using NMF. NMF algorithm performs better in separating the speech mixture when parts of the individual speech signals are known a priori. This requirement is satisfied by speech segmentation stage. NMF further separates the individual speech signals in the mixture by maintaining continuity of speech samples over time. To further improve the accuracy of separated speech signals, various masking methods like TFR (Time frequency Ratio), SM (Soft Mask) and HM (Hard Mask) are applied. The separation results are compared with other unsupervised algorithms. The proposed hybrid model produces promising results in unsupervised single channel speech separation. This model can be applied at the front end of any voice recognition platform to further improve the recognition efficiency.</description><identifier>ISSN: 1573-7721</identifier><identifier>ISSN: 1380-7501</identifier><identifier>EISSN: 1573-7721</identifier><identifier>DOI: 10.1007/s11042-023-16108-z</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Algorithms ; Background noise ; Computer Communication Networks ; Computer Science ; Data Structures and Information Theory ; Masking ; Mathematical models ; Mixtures ; Multimedia Information Systems ; Segmentation ; Separation ; Special Purpose and Application-Based Systems ; Speech ; Voice recognition</subject><ispartof>Multimedia tools and applications, 2024-02, Vol.83 (5), p.13241-13259</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-81cf91836f146237f948a4cf3970bdbe60c6997b0358d7297bc613210447c43f3</citedby><cites>FETCH-LOGICAL-c319t-81cf91836f146237f948a4cf3970bdbe60c6997b0358d7297bc613210447c43f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11042-023-16108-z$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11042-023-16108-z$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Prasanna Kumar, MK</creatorcontrib><creatorcontrib>Kumaraswamy, R.</creatorcontrib><title>A hybrid model for unsupervised single channel speech separation</title><title>Multimedia tools and applications</title><addtitle>Multimed Tools Appl</addtitle><description>The performance of any voice recognition platform in real environment depends on how well the desired speech signal is separated from unwanted signals like background noise or background speakers. In this paper, we propose a three stage hybrid model to separate two speakers from single channel speech mixture under unsupervised condition. Proposed method combines three techniques namely speech segmentation, NMF (Nonnegative Matrix Factorization) and Masking. Speech segmentation groups the short speech frames belonging to individual speakers by identifying the speaker change over points. The segmentation block groups the speech frames belonging to individual speakers but lacks in continuity of the speech samples. Therefore a second stage is built using NMF. NMF algorithm performs better in separating the speech mixture when parts of the individual speech signals are known a priori. This requirement is satisfied by speech segmentation stage. NMF further separates the individual speech signals in the mixture by maintaining continuity of speech samples over time. To further improve the accuracy of separated speech signals, various masking methods like TFR (Time frequency Ratio), SM (Soft Mask) and HM (Hard Mask) are applied. The separation results are compared with other unsupervised algorithms. The proposed hybrid model produces promising results in unsupervised single channel speech separation. This model can be applied at the front end of any voice recognition platform to further improve the recognition efficiency.</description><subject>Algorithms</subject><subject>Background noise</subject><subject>Computer Communication Networks</subject><subject>Computer Science</subject><subject>Data Structures and Information Theory</subject><subject>Masking</subject><subject>Mathematical models</subject><subject>Mixtures</subject><subject>Multimedia Information Systems</subject><subject>Segmentation</subject><subject>Separation</subject><subject>Special Purpose and Application-Based Systems</subject><subject>Speech</subject><subject>Voice recognition</subject><issn>1573-7721</issn><issn>1380-7501</issn><issn>1573-7721</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kE9LAzEQxYMoWKtfwFPAczSTpMnuzVL8B4IXPYfdbNJuabNrpiu0n97oCnryNA_mvTfMj5BL4NfAublBAK4E40Iy0MALdjgiE5gZyYwRcPxHn5IzxDXnoGdCTcjtnK72dWobuu0av6GhS3SIOPQ-fbToG4ptXG48dasqxrzH3nu3ouj7KlW7tovn5CRUG_QXP3NK3u7vXheP7Pnl4Wkxf2ZOQrljBbhQQiF1AKWFNKFURaVckKXhdVN7zZ0uS1NzOSsaI7JyGqTIXynjlAxySq7G3j5174PHnV13Q4r5pBW52GgjpcguMbpc6hCTD7ZP7bZKewvcfpGyIymbSdlvUvaQQ3IMYTbHpU-_1f-kPgGq6Wrc</recordid><startdate>20240201</startdate><enddate>20240201</enddate><creator>Prasanna Kumar, MK</creator><creator>Kumaraswamy, R.</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20240201</creationdate><title>A hybrid model for unsupervised single channel speech separation</title><author>Prasanna Kumar, MK ; Kumaraswamy, R.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-81cf91836f146237f948a4cf3970bdbe60c6997b0358d7297bc613210447c43f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Background noise</topic><topic>Computer Communication Networks</topic><topic>Computer Science</topic><topic>Data Structures and Information Theory</topic><topic>Masking</topic><topic>Mathematical models</topic><topic>Mixtures</topic><topic>Multimedia Information Systems</topic><topic>Segmentation</topic><topic>Separation</topic><topic>Special Purpose and Application-Based Systems</topic><topic>Speech</topic><topic>Voice recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Prasanna Kumar, MK</creatorcontrib><creatorcontrib>Kumaraswamy, R.</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Multimedia tools and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Prasanna Kumar, MK</au><au>Kumaraswamy, R.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A hybrid model for unsupervised single channel speech separation</atitle><jtitle>Multimedia tools and applications</jtitle><stitle>Multimed Tools Appl</stitle><date>2024-02-01</date><risdate>2024</risdate><volume>83</volume><issue>5</issue><spage>13241</spage><epage>13259</epage><pages>13241-13259</pages><issn>1573-7721</issn><issn>1380-7501</issn><eissn>1573-7721</eissn><abstract>The performance of any voice recognition platform in real environment depends on how well the desired speech signal is separated from unwanted signals like background noise or background speakers. In this paper, we propose a three stage hybrid model to separate two speakers from single channel speech mixture under unsupervised condition. Proposed method combines three techniques namely speech segmentation, NMF (Nonnegative Matrix Factorization) and Masking. Speech segmentation groups the short speech frames belonging to individual speakers by identifying the speaker change over points. The segmentation block groups the speech frames belonging to individual speakers but lacks in continuity of the speech samples. Therefore a second stage is built using NMF. NMF algorithm performs better in separating the speech mixture when parts of the individual speech signals are known a priori. This requirement is satisfied by speech segmentation stage. NMF further separates the individual speech signals in the mixture by maintaining continuity of speech samples over time. To further improve the accuracy of separated speech signals, various masking methods like TFR (Time frequency Ratio), SM (Soft Mask) and HM (Hard Mask) are applied. The separation results are compared with other unsupervised algorithms. The proposed hybrid model produces promising results in unsupervised single channel speech separation. This model can be applied at the front end of any voice recognition platform to further improve the recognition efficiency.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11042-023-16108-z</doi><tpages>19</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1573-7721
ispartof Multimedia tools and applications, 2024-02, Vol.83 (5), p.13241-13259
issn 1573-7721
1380-7501
1573-7721
language eng
recordid cdi_proquest_journals_2918767332
source SpringerLink Journals - AutoHoldings
subjects Algorithms
Background noise
Computer Communication Networks
Computer Science
Data Structures and Information Theory
Masking
Mathematical models
Mixtures
Multimedia Information Systems
Segmentation
Separation
Special Purpose and Application-Based Systems
Speech
Voice recognition
title A hybrid model for unsupervised single channel speech separation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T06%3A21%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20hybrid%20model%20for%20unsupervised%20single%20channel%20speech%20separation&rft.jtitle=Multimedia%20tools%20and%20applications&rft.au=Prasanna%20Kumar,%20MK&rft.date=2024-02-01&rft.volume=83&rft.issue=5&rft.spage=13241&rft.epage=13259&rft.pages=13241-13259&rft.issn=1573-7721&rft.eissn=1573-7721&rft_id=info:doi/10.1007/s11042-023-16108-z&rft_dat=%3Cproquest_cross%3E2918767332%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2918767332&rft_id=info:pmid/&rfr_iscdi=true