Deep neural network based speech enhancement using mono channel mask
Getting enhanced speech from the noisy speech signal is a task of particular importance in the area of speech processing. Here we propose a deep neural network (DNN) based speech enhancement method utilising mono channel mask. The proposed method employs cochleagram to find an initial binary mask. T...
Gespeichert in:
Veröffentlicht in: | International journal of speech technology 2019-09, Vol.22 (3), p.841-850 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 850 |
---|---|
container_issue | 3 |
container_start_page | 841 |
container_title | International journal of speech technology |
container_volume | 22 |
creator | Ingale, Pallavi P. Nalbalwar, Sanjay L. |
description | Getting enhanced speech from the noisy speech signal is a task of particular importance in the area of speech processing. Here we propose a deep neural network (DNN) based speech enhancement method utilising mono channel mask. The proposed method employs cochleagram to find an initial binary mask. Then modified sub-harmonic summation algorithm is applied on initial binary mask to obtain an intermediate mask. The spectro-temporal features of this intermediate mask are fed to DNN. DNN finds out the correct spectral structure in the frames associated with the target speech which are further used to develop the mono channel mask. Speech signal is reconstructed using mono channel mask. Mono channel mask avoids the unnecessary interference from the noisy time–frequency (T–F) units. Objective evaluations done using perceptual evaluation of speech quality (PESQ) and normalized source to distortion ratio indicate that the proposed method outperforms the state of the art methods in the area of speech enhancement. Obtained values of PESQ shows that proposed method improves the quality of the speech in noisy conditions. The experimental results present the effectiveness of the mono channel mask in speech enhancement. The proposed method gives better performance compared to other methods. |
doi_str_mv | 10.1007/s10772-019-09627-4 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2288804869</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2288804869</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-46e38fa84f97a3958b69d9a2dcc213ffee90fb643942ac49c69fca70d4e89b523</originalsourceid><addsrcrecordid>eNp9kEtLxDAUhYMoOD7-gKuA62pe0-YuZcYXDLjRdUjTm3k2rUmL-O-NVnDn6lwO55wLHyFXnN1wxqrbxFlViYJxKBiUoirUEZnxebY05-w431LzQihenpKzlHaMMahAzMhyidjTgGO0hyzDRxf3tLYJG5p6RLehGDY2OGwxDHRM27CmbRc66rIb8EBbm_YX5MTbQ8LLXz0nbw_3r4unYvXy-Ly4WxVOchgKVaLU3mrlobIS5rouoQErGucEl94jAvN1qSQoYZ0CV4J3tmKNQg31XMhzcj3t9rF7HzENZteNMeSXRgitNVO6hJwSU8rFLqWI3vRx29r4aTgz37TMRMtkWuaHllG5JKdSyuGwxvg3_U_rCz6fbSE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2288804869</pqid></control><display><type>article</type><title>Deep neural network based speech enhancement using mono channel mask</title><source>SpringerNature Journals</source><creator>Ingale, Pallavi P. ; Nalbalwar, Sanjay L.</creator><creatorcontrib>Ingale, Pallavi P. ; Nalbalwar, Sanjay L.</creatorcontrib><description>Getting enhanced speech from the noisy speech signal is a task of particular importance in the area of speech processing. Here we propose a deep neural network (DNN) based speech enhancement method utilising mono channel mask. The proposed method employs cochleagram to find an initial binary mask. Then modified sub-harmonic summation algorithm is applied on initial binary mask to obtain an intermediate mask. The spectro-temporal features of this intermediate mask are fed to DNN. DNN finds out the correct spectral structure in the frames associated with the target speech which are further used to develop the mono channel mask. Speech signal is reconstructed using mono channel mask. Mono channel mask avoids the unnecessary interference from the noisy time–frequency (T–F) units. Objective evaluations done using perceptual evaluation of speech quality (PESQ) and normalized source to distortion ratio indicate that the proposed method outperforms the state of the art methods in the area of speech enhancement. Obtained values of PESQ shows that proposed method improves the quality of the speech in noisy conditions. The experimental results present the effectiveness of the mono channel mask in speech enhancement. The proposed method gives better performance compared to other methods.</description><identifier>ISSN: 1381-2416</identifier><identifier>EISSN: 1572-8110</identifier><identifier>DOI: 10.1007/s10772-019-09627-4</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Algorithms ; Artificial Intelligence ; Artificial neural networks ; Deep learning ; Engineering ; Neural networks ; Signal,Image and Speech Processing ; Social Sciences ; Speech ; Speech enhancement ; Speech perception ; Speech processing</subject><ispartof>International journal of speech technology, 2019-09, Vol.22 (3), p.841-850</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2019</rights><rights>Copyright Springer Nature B.V. 2019</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-46e38fa84f97a3958b69d9a2dcc213ffee90fb643942ac49c69fca70d4e89b523</citedby><cites>FETCH-LOGICAL-c319t-46e38fa84f97a3958b69d9a2dcc213ffee90fb643942ac49c69fca70d4e89b523</cites><orcidid>0000-0002-1010-2022</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10772-019-09627-4$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10772-019-09627-4$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Ingale, Pallavi P.</creatorcontrib><creatorcontrib>Nalbalwar, Sanjay L.</creatorcontrib><title>Deep neural network based speech enhancement using mono channel mask</title><title>International journal of speech technology</title><addtitle>Int J Speech Technol</addtitle><description>Getting enhanced speech from the noisy speech signal is a task of particular importance in the area of speech processing. Here we propose a deep neural network (DNN) based speech enhancement method utilising mono channel mask. The proposed method employs cochleagram to find an initial binary mask. Then modified sub-harmonic summation algorithm is applied on initial binary mask to obtain an intermediate mask. The spectro-temporal features of this intermediate mask are fed to DNN. DNN finds out the correct spectral structure in the frames associated with the target speech which are further used to develop the mono channel mask. Speech signal is reconstructed using mono channel mask. Mono channel mask avoids the unnecessary interference from the noisy time–frequency (T–F) units. Objective evaluations done using perceptual evaluation of speech quality (PESQ) and normalized source to distortion ratio indicate that the proposed method outperforms the state of the art methods in the area of speech enhancement. Obtained values of PESQ shows that proposed method improves the quality of the speech in noisy conditions. The experimental results present the effectiveness of the mono channel mask in speech enhancement. The proposed method gives better performance compared to other methods.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Artificial neural networks</subject><subject>Deep learning</subject><subject>Engineering</subject><subject>Neural networks</subject><subject>Signal,Image and Speech Processing</subject><subject>Social Sciences</subject><subject>Speech</subject><subject>Speech enhancement</subject><subject>Speech perception</subject><subject>Speech processing</subject><issn>1381-2416</issn><issn>1572-8110</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp9kEtLxDAUhYMoOD7-gKuA62pe0-YuZcYXDLjRdUjTm3k2rUmL-O-NVnDn6lwO55wLHyFXnN1wxqrbxFlViYJxKBiUoirUEZnxebY05-w431LzQihenpKzlHaMMahAzMhyidjTgGO0hyzDRxf3tLYJG5p6RLehGDY2OGwxDHRM27CmbRc66rIb8EBbm_YX5MTbQ8LLXz0nbw_3r4unYvXy-Ly4WxVOchgKVaLU3mrlobIS5rouoQErGucEl94jAvN1qSQoYZ0CV4J3tmKNQg31XMhzcj3t9rF7HzENZteNMeSXRgitNVO6hJwSU8rFLqWI3vRx29r4aTgz37TMRMtkWuaHllG5JKdSyuGwxvg3_U_rCz6fbSE</recordid><startdate>20190901</startdate><enddate>20190901</enddate><creator>Ingale, Pallavi P.</creator><creator>Nalbalwar, Sanjay L.</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7T9</scope><orcidid>https://orcid.org/0000-0002-1010-2022</orcidid></search><sort><creationdate>20190901</creationdate><title>Deep neural network based speech enhancement using mono channel mask</title><author>Ingale, Pallavi P. ; Nalbalwar, Sanjay L.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-46e38fa84f97a3958b69d9a2dcc213ffee90fb643942ac49c69fca70d4e89b523</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Artificial neural networks</topic><topic>Deep learning</topic><topic>Engineering</topic><topic>Neural networks</topic><topic>Signal,Image and Speech Processing</topic><topic>Social Sciences</topic><topic>Speech</topic><topic>Speech enhancement</topic><topic>Speech perception</topic><topic>Speech processing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ingale, Pallavi P.</creatorcontrib><creatorcontrib>Nalbalwar, Sanjay L.</creatorcontrib><collection>CrossRef</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><jtitle>International journal of speech technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ingale, Pallavi P.</au><au>Nalbalwar, Sanjay L.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep neural network based speech enhancement using mono channel mask</atitle><jtitle>International journal of speech technology</jtitle><stitle>Int J Speech Technol</stitle><date>2019-09-01</date><risdate>2019</risdate><volume>22</volume><issue>3</issue><spage>841</spage><epage>850</epage><pages>841-850</pages><issn>1381-2416</issn><eissn>1572-8110</eissn><abstract>Getting enhanced speech from the noisy speech signal is a task of particular importance in the area of speech processing. Here we propose a deep neural network (DNN) based speech enhancement method utilising mono channel mask. The proposed method employs cochleagram to find an initial binary mask. Then modified sub-harmonic summation algorithm is applied on initial binary mask to obtain an intermediate mask. The spectro-temporal features of this intermediate mask are fed to DNN. DNN finds out the correct spectral structure in the frames associated with the target speech which are further used to develop the mono channel mask. Speech signal is reconstructed using mono channel mask. Mono channel mask avoids the unnecessary interference from the noisy time–frequency (T–F) units. Objective evaluations done using perceptual evaluation of speech quality (PESQ) and normalized source to distortion ratio indicate that the proposed method outperforms the state of the art methods in the area of speech enhancement. Obtained values of PESQ shows that proposed method improves the quality of the speech in noisy conditions. The experimental results present the effectiveness of the mono channel mask in speech enhancement. The proposed method gives better performance compared to other methods.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10772-019-09627-4</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0002-1010-2022</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1381-2416 |
ispartof | International journal of speech technology, 2019-09, Vol.22 (3), p.841-850 |
issn | 1381-2416 1572-8110 |
language | eng |
recordid | cdi_proquest_journals_2288804869 |
source | SpringerNature Journals |
subjects | Algorithms Artificial Intelligence Artificial neural networks Deep learning Engineering Neural networks Signal,Image and Speech Processing Social Sciences Speech Speech enhancement Speech perception Speech processing |
title | Deep neural network based speech enhancement using mono channel mask |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T22%3A41%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20neural%20network%20based%20speech%20enhancement%20using%20mono%20channel%20mask&rft.jtitle=International%20journal%20of%20speech%20technology&rft.au=Ingale,%20Pallavi%20P.&rft.date=2019-09-01&rft.volume=22&rft.issue=3&rft.spage=841&rft.epage=850&rft.pages=841-850&rft.issn=1381-2416&rft.eissn=1572-8110&rft_id=info:doi/10.1007/s10772-019-09627-4&rft_dat=%3Cproquest_cross%3E2288804869%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2288804869&rft_id=info:pmid/&rfr_iscdi=true |