A hybrid model for unsupervised single channel speech separation
The performance of any voice recognition platform in real environment depends on how well the desired speech signal is separated from unwanted signals like background noise or background speakers. In this paper, we propose a three stage hybrid model to separate two speakers from single channel speec...
Gespeichert in:
Veröffentlicht in: | Multimedia tools and applications 2024-02, Vol.83 (5), p.13241-13259 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 13259 |
---|---|
container_issue | 5 |
container_start_page | 13241 |
container_title | Multimedia tools and applications |
container_volume | 83 |
creator | Prasanna Kumar, MK Kumaraswamy, R. |
description | The performance of any voice recognition platform in real environment depends on how well the desired speech signal is separated from unwanted signals like background noise or background speakers. In this paper, we propose a three stage hybrid model to separate two speakers from single channel speech mixture under unsupervised condition. Proposed method combines three techniques namely speech segmentation, NMF (Nonnegative Matrix Factorization) and Masking. Speech segmentation groups the short speech frames belonging to individual speakers by identifying the speaker change over points. The segmentation block groups the speech frames belonging to individual speakers but lacks in continuity of the speech samples. Therefore a second stage is built using NMF. NMF algorithm performs better in separating the speech mixture when parts of the individual speech signals are known a priori. This requirement is satisfied by speech segmentation stage. NMF further separates the individual speech signals in the mixture by maintaining continuity of speech samples over time. To further improve the accuracy of separated speech signals, various masking methods like TFR (Time frequency Ratio), SM (Soft Mask) and HM (Hard Mask) are applied. The separation results are compared with other unsupervised algorithms. The proposed hybrid model produces promising results in unsupervised single channel speech separation. This model can be applied at the front end of any voice recognition platform to further improve the recognition efficiency. |
doi_str_mv | 10.1007/s11042-023-16108-z |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2918767332</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2918767332</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-81cf91836f146237f948a4cf3970bdbe60c6997b0358d7297bc613210447c43f3</originalsourceid><addsrcrecordid>eNp9kE9LAzEQxYMoWKtfwFPAczSTpMnuzVL8B4IXPYfdbNJuabNrpiu0n97oCnryNA_mvTfMj5BL4NfAublBAK4E40Iy0MALdjgiE5gZyYwRcPxHn5IzxDXnoGdCTcjtnK72dWobuu0av6GhS3SIOPQ-fbToG4ptXG48dasqxrzH3nu3ouj7KlW7tovn5CRUG_QXP3NK3u7vXheP7Pnl4Wkxf2ZOQrljBbhQQiF1AKWFNKFURaVckKXhdVN7zZ0uS1NzOSsaI7JyGqTIXynjlAxySq7G3j5174PHnV13Q4r5pBW52GgjpcguMbpc6hCTD7ZP7bZKewvcfpGyIymbSdlvUvaQQ3IMYTbHpU-_1f-kPgGq6Wrc</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918767332</pqid></control><display><type>article</type><title>A hybrid model for unsupervised single channel speech separation</title><source>SpringerLink Journals - AutoHoldings</source><creator>Prasanna Kumar, MK ; Kumaraswamy, R.</creator><creatorcontrib>Prasanna Kumar, MK ; Kumaraswamy, R.</creatorcontrib><description>The performance of any voice recognition platform in real environment depends on how well the desired speech signal is separated from unwanted signals like background noise or background speakers. In this paper, we propose a three stage hybrid model to separate two speakers from single channel speech mixture under unsupervised condition. Proposed method combines three techniques namely speech segmentation, NMF (Nonnegative Matrix Factorization) and Masking. Speech segmentation groups the short speech frames belonging to individual speakers by identifying the speaker change over points. The segmentation block groups the speech frames belonging to individual speakers but lacks in continuity of the speech samples. Therefore a second stage is built using NMF. NMF algorithm performs better in separating the speech mixture when parts of the individual speech signals are known a priori. This requirement is satisfied by speech segmentation stage. NMF further separates the individual speech signals in the mixture by maintaining continuity of speech samples over time. To further improve the accuracy of separated speech signals, various masking methods like TFR (Time frequency Ratio), SM (Soft Mask) and HM (Hard Mask) are applied. The separation results are compared with other unsupervised algorithms. The proposed hybrid model produces promising results in unsupervised single channel speech separation. This model can be applied at the front end of any voice recognition platform to further improve the recognition efficiency.</description><identifier>ISSN: 1573-7721</identifier><identifier>ISSN: 1380-7501</identifier><identifier>EISSN: 1573-7721</identifier><identifier>DOI: 10.1007/s11042-023-16108-z</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Algorithms ; Background noise ; Computer Communication Networks ; Computer Science ; Data Structures and Information Theory ; Masking ; Mathematical models ; Mixtures ; Multimedia Information Systems ; Segmentation ; Separation ; Special Purpose and Application-Based Systems ; Speech ; Voice recognition</subject><ispartof>Multimedia tools and applications, 2024-02, Vol.83 (5), p.13241-13259</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-81cf91836f146237f948a4cf3970bdbe60c6997b0358d7297bc613210447c43f3</citedby><cites>FETCH-LOGICAL-c319t-81cf91836f146237f948a4cf3970bdbe60c6997b0358d7297bc613210447c43f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11042-023-16108-z$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11042-023-16108-z$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Prasanna Kumar, MK</creatorcontrib><creatorcontrib>Kumaraswamy, R.</creatorcontrib><title>A hybrid model for unsupervised single channel speech separation</title><title>Multimedia tools and applications</title><addtitle>Multimed Tools Appl</addtitle><description>The performance of any voice recognition platform in real environment depends on how well the desired speech signal is separated from unwanted signals like background noise or background speakers. In this paper, we propose a three stage hybrid model to separate two speakers from single channel speech mixture under unsupervised condition. Proposed method combines three techniques namely speech segmentation, NMF (Nonnegative Matrix Factorization) and Masking. Speech segmentation groups the short speech frames belonging to individual speakers by identifying the speaker change over points. The segmentation block groups the speech frames belonging to individual speakers but lacks in continuity of the speech samples. Therefore a second stage is built using NMF. NMF algorithm performs better in separating the speech mixture when parts of the individual speech signals are known a priori. This requirement is satisfied by speech segmentation stage. NMF further separates the individual speech signals in the mixture by maintaining continuity of speech samples over time. To further improve the accuracy of separated speech signals, various masking methods like TFR (Time frequency Ratio), SM (Soft Mask) and HM (Hard Mask) are applied. The separation results are compared with other unsupervised algorithms. The proposed hybrid model produces promising results in unsupervised single channel speech separation. This model can be applied at the front end of any voice recognition platform to further improve the recognition efficiency.</description><subject>Algorithms</subject><subject>Background noise</subject><subject>Computer Communication Networks</subject><subject>Computer Science</subject><subject>Data Structures and Information Theory</subject><subject>Masking</subject><subject>Mathematical models</subject><subject>Mixtures</subject><subject>Multimedia Information Systems</subject><subject>Segmentation</subject><subject>Separation</subject><subject>Special Purpose and Application-Based Systems</subject><subject>Speech</subject><subject>Voice recognition</subject><issn>1573-7721</issn><issn>1380-7501</issn><issn>1573-7721</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kE9LAzEQxYMoWKtfwFPAczSTpMnuzVL8B4IXPYfdbNJuabNrpiu0n97oCnryNA_mvTfMj5BL4NfAublBAK4E40Iy0MALdjgiE5gZyYwRcPxHn5IzxDXnoGdCTcjtnK72dWobuu0av6GhS3SIOPQ-fbToG4ptXG48dasqxrzH3nu3ouj7KlW7tovn5CRUG_QXP3NK3u7vXheP7Pnl4Wkxf2ZOQrljBbhQQiF1AKWFNKFURaVckKXhdVN7zZ0uS1NzOSsaI7JyGqTIXynjlAxySq7G3j5174PHnV13Q4r5pBW52GgjpcguMbpc6hCTD7ZP7bZKewvcfpGyIymbSdlvUvaQQ3IMYTbHpU-_1f-kPgGq6Wrc</recordid><startdate>20240201</startdate><enddate>20240201</enddate><creator>Prasanna Kumar, MK</creator><creator>Kumaraswamy, R.</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20240201</creationdate><title>A hybrid model for unsupervised single channel speech separation</title><author>Prasanna Kumar, MK ; Kumaraswamy, R.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-81cf91836f146237f948a4cf3970bdbe60c6997b0358d7297bc613210447c43f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Background noise</topic><topic>Computer Communication Networks</topic><topic>Computer Science</topic><topic>Data Structures and Information Theory</topic><topic>Masking</topic><topic>Mathematical models</topic><topic>Mixtures</topic><topic>Multimedia Information Systems</topic><topic>Segmentation</topic><topic>Separation</topic><topic>Special Purpose and Application-Based Systems</topic><topic>Speech</topic><topic>Voice recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Prasanna Kumar, MK</creatorcontrib><creatorcontrib>Kumaraswamy, R.</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Multimedia tools and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Prasanna Kumar, MK</au><au>Kumaraswamy, R.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A hybrid model for unsupervised single channel speech separation</atitle><jtitle>Multimedia tools and applications</jtitle><stitle>Multimed Tools Appl</stitle><date>2024-02-01</date><risdate>2024</risdate><volume>83</volume><issue>5</issue><spage>13241</spage><epage>13259</epage><pages>13241-13259</pages><issn>1573-7721</issn><issn>1380-7501</issn><eissn>1573-7721</eissn><abstract>The performance of any voice recognition platform in real environment depends on how well the desired speech signal is separated from unwanted signals like background noise or background speakers. In this paper, we propose a three stage hybrid model to separate two speakers from single channel speech mixture under unsupervised condition. Proposed method combines three techniques namely speech segmentation, NMF (Nonnegative Matrix Factorization) and Masking. Speech segmentation groups the short speech frames belonging to individual speakers by identifying the speaker change over points. The segmentation block groups the speech frames belonging to individual speakers but lacks in continuity of the speech samples. Therefore a second stage is built using NMF. NMF algorithm performs better in separating the speech mixture when parts of the individual speech signals are known a priori. This requirement is satisfied by speech segmentation stage. NMF further separates the individual speech signals in the mixture by maintaining continuity of speech samples over time. To further improve the accuracy of separated speech signals, various masking methods like TFR (Time frequency Ratio), SM (Soft Mask) and HM (Hard Mask) are applied. The separation results are compared with other unsupervised algorithms. The proposed hybrid model produces promising results in unsupervised single channel speech separation. This model can be applied at the front end of any voice recognition platform to further improve the recognition efficiency.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11042-023-16108-z</doi><tpages>19</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1573-7721 |
ispartof | Multimedia tools and applications, 2024-02, Vol.83 (5), p.13241-13259 |
issn | 1573-7721 1380-7501 1573-7721 |
language | eng |
recordid | cdi_proquest_journals_2918767332 |
source | SpringerLink Journals - AutoHoldings |
subjects | Algorithms Background noise Computer Communication Networks Computer Science Data Structures and Information Theory Masking Mathematical models Mixtures Multimedia Information Systems Segmentation Separation Special Purpose and Application-Based Systems Speech Voice recognition |
title | A hybrid model for unsupervised single channel speech separation |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T06%3A21%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20hybrid%20model%20for%20unsupervised%20single%20channel%20speech%20separation&rft.jtitle=Multimedia%20tools%20and%20applications&rft.au=Prasanna%20Kumar,%20MK&rft.date=2024-02-01&rft.volume=83&rft.issue=5&rft.spage=13241&rft.epage=13259&rft.pages=13241-13259&rft.issn=1573-7721&rft.eissn=1573-7721&rft_id=info:doi/10.1007/s11042-023-16108-z&rft_dat=%3Cproquest_cross%3E2918767332%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2918767332&rft_id=info:pmid/&rfr_iscdi=true |