Deep neural network based speech enhancement using mono channel mask

Getting enhanced speech from the noisy speech signal is a task of particular importance in the area of speech processing. Here we propose a deep neural network (DNN) based speech enhancement method utilising mono channel mask. The proposed method employs cochleagram to find an initial binary mask. T...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of speech technology 2019-09, Vol.22 (3), p.841-850
Hauptverfasser: Ingale, Pallavi P., Nalbalwar, Sanjay L.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 850
container_issue 3
container_start_page 841
container_title International journal of speech technology
container_volume 22
creator Ingale, Pallavi P.
Nalbalwar, Sanjay L.
description Getting enhanced speech from the noisy speech signal is a task of particular importance in the area of speech processing. Here we propose a deep neural network (DNN) based speech enhancement method utilising mono channel mask. The proposed method employs cochleagram to find an initial binary mask. Then modified sub-harmonic summation algorithm is applied on initial binary mask to obtain an intermediate mask. The spectro-temporal features of this intermediate mask are fed to DNN. DNN finds out the correct spectral structure in the frames associated with the target speech which are further used to develop the mono channel mask. Speech signal is reconstructed using mono channel mask. Mono channel mask avoids the unnecessary interference from the noisy time–frequency (T–F) units. Objective evaluations done using perceptual evaluation of speech quality (PESQ) and normalized source to distortion ratio indicate that the proposed method outperforms the state of the art methods in the area of speech enhancement. Obtained values of PESQ shows that proposed method improves the quality of the speech in noisy conditions. The experimental results present the effectiveness of the mono channel mask in speech enhancement. The proposed method gives better performance compared to other methods.
doi_str_mv 10.1007/s10772-019-09627-4
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2288804869</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2288804869</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-46e38fa84f97a3958b69d9a2dcc213ffee90fb643942ac49c69fca70d4e89b523</originalsourceid><addsrcrecordid>eNp9kEtLxDAUhYMoOD7-gKuA62pe0-YuZcYXDLjRdUjTm3k2rUmL-O-NVnDn6lwO55wLHyFXnN1wxqrbxFlViYJxKBiUoirUEZnxebY05-w431LzQihenpKzlHaMMahAzMhyidjTgGO0hyzDRxf3tLYJG5p6RLehGDY2OGwxDHRM27CmbRc66rIb8EBbm_YX5MTbQ8LLXz0nbw_3r4unYvXy-Ly4WxVOchgKVaLU3mrlobIS5rouoQErGucEl94jAvN1qSQoYZ0CV4J3tmKNQg31XMhzcj3t9rF7HzENZteNMeSXRgitNVO6hJwSU8rFLqWI3vRx29r4aTgz37TMRMtkWuaHllG5JKdSyuGwxvg3_U_rCz6fbSE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2288804869</pqid></control><display><type>article</type><title>Deep neural network based speech enhancement using mono channel mask</title><source>SpringerNature Journals</source><creator>Ingale, Pallavi P. ; Nalbalwar, Sanjay L.</creator><creatorcontrib>Ingale, Pallavi P. ; Nalbalwar, Sanjay L.</creatorcontrib><description>Getting enhanced speech from the noisy speech signal is a task of particular importance in the area of speech processing. Here we propose a deep neural network (DNN) based speech enhancement method utilising mono channel mask. The proposed method employs cochleagram to find an initial binary mask. Then modified sub-harmonic summation algorithm is applied on initial binary mask to obtain an intermediate mask. The spectro-temporal features of this intermediate mask are fed to DNN. DNN finds out the correct spectral structure in the frames associated with the target speech which are further used to develop the mono channel mask. Speech signal is reconstructed using mono channel mask. Mono channel mask avoids the unnecessary interference from the noisy time–frequency (T–F) units. Objective evaluations done using perceptual evaluation of speech quality (PESQ) and normalized source to distortion ratio indicate that the proposed method outperforms the state of the art methods in the area of speech enhancement. Obtained values of PESQ shows that proposed method improves the quality of the speech in noisy conditions. The experimental results present the effectiveness of the mono channel mask in speech enhancement. The proposed method gives better performance compared to other methods.</description><identifier>ISSN: 1381-2416</identifier><identifier>EISSN: 1572-8110</identifier><identifier>DOI: 10.1007/s10772-019-09627-4</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Algorithms ; Artificial Intelligence ; Artificial neural networks ; Deep learning ; Engineering ; Neural networks ; Signal,Image and Speech Processing ; Social Sciences ; Speech ; Speech enhancement ; Speech perception ; Speech processing</subject><ispartof>International journal of speech technology, 2019-09, Vol.22 (3), p.841-850</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2019</rights><rights>Copyright Springer Nature B.V. 2019</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-46e38fa84f97a3958b69d9a2dcc213ffee90fb643942ac49c69fca70d4e89b523</citedby><cites>FETCH-LOGICAL-c319t-46e38fa84f97a3958b69d9a2dcc213ffee90fb643942ac49c69fca70d4e89b523</cites><orcidid>0000-0002-1010-2022</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10772-019-09627-4$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10772-019-09627-4$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Ingale, Pallavi P.</creatorcontrib><creatorcontrib>Nalbalwar, Sanjay L.</creatorcontrib><title>Deep neural network based speech enhancement using mono channel mask</title><title>International journal of speech technology</title><addtitle>Int J Speech Technol</addtitle><description>Getting enhanced speech from the noisy speech signal is a task of particular importance in the area of speech processing. Here we propose a deep neural network (DNN) based speech enhancement method utilising mono channel mask. The proposed method employs cochleagram to find an initial binary mask. Then modified sub-harmonic summation algorithm is applied on initial binary mask to obtain an intermediate mask. The spectro-temporal features of this intermediate mask are fed to DNN. DNN finds out the correct spectral structure in the frames associated with the target speech which are further used to develop the mono channel mask. Speech signal is reconstructed using mono channel mask. Mono channel mask avoids the unnecessary interference from the noisy time–frequency (T–F) units. Objective evaluations done using perceptual evaluation of speech quality (PESQ) and normalized source to distortion ratio indicate that the proposed method outperforms the state of the art methods in the area of speech enhancement. Obtained values of PESQ shows that proposed method improves the quality of the speech in noisy conditions. The experimental results present the effectiveness of the mono channel mask in speech enhancement. The proposed method gives better performance compared to other methods.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Artificial neural networks</subject><subject>Deep learning</subject><subject>Engineering</subject><subject>Neural networks</subject><subject>Signal,Image and Speech Processing</subject><subject>Social Sciences</subject><subject>Speech</subject><subject>Speech enhancement</subject><subject>Speech perception</subject><subject>Speech processing</subject><issn>1381-2416</issn><issn>1572-8110</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp9kEtLxDAUhYMoOD7-gKuA62pe0-YuZcYXDLjRdUjTm3k2rUmL-O-NVnDn6lwO55wLHyFXnN1wxqrbxFlViYJxKBiUoirUEZnxebY05-w431LzQihenpKzlHaMMahAzMhyidjTgGO0hyzDRxf3tLYJG5p6RLehGDY2OGwxDHRM27CmbRc66rIb8EBbm_YX5MTbQ8LLXz0nbw_3r4unYvXy-Ly4WxVOchgKVaLU3mrlobIS5rouoQErGucEl94jAvN1qSQoYZ0CV4J3tmKNQg31XMhzcj3t9rF7HzENZteNMeSXRgitNVO6hJwSU8rFLqWI3vRx29r4aTgz37TMRMtkWuaHllG5JKdSyuGwxvg3_U_rCz6fbSE</recordid><startdate>20190901</startdate><enddate>20190901</enddate><creator>Ingale, Pallavi P.</creator><creator>Nalbalwar, Sanjay L.</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7T9</scope><orcidid>https://orcid.org/0000-0002-1010-2022</orcidid></search><sort><creationdate>20190901</creationdate><title>Deep neural network based speech enhancement using mono channel mask</title><author>Ingale, Pallavi P. ; Nalbalwar, Sanjay L.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-46e38fa84f97a3958b69d9a2dcc213ffee90fb643942ac49c69fca70d4e89b523</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Artificial neural networks</topic><topic>Deep learning</topic><topic>Engineering</topic><topic>Neural networks</topic><topic>Signal,Image and Speech Processing</topic><topic>Social Sciences</topic><topic>Speech</topic><topic>Speech enhancement</topic><topic>Speech perception</topic><topic>Speech processing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ingale, Pallavi P.</creatorcontrib><creatorcontrib>Nalbalwar, Sanjay L.</creatorcontrib><collection>CrossRef</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><jtitle>International journal of speech technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ingale, Pallavi P.</au><au>Nalbalwar, Sanjay L.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep neural network based speech enhancement using mono channel mask</atitle><jtitle>International journal of speech technology</jtitle><stitle>Int J Speech Technol</stitle><date>2019-09-01</date><risdate>2019</risdate><volume>22</volume><issue>3</issue><spage>841</spage><epage>850</epage><pages>841-850</pages><issn>1381-2416</issn><eissn>1572-8110</eissn><abstract>Getting enhanced speech from the noisy speech signal is a task of particular importance in the area of speech processing. Here we propose a deep neural network (DNN) based speech enhancement method utilising mono channel mask. The proposed method employs cochleagram to find an initial binary mask. Then modified sub-harmonic summation algorithm is applied on initial binary mask to obtain an intermediate mask. The spectro-temporal features of this intermediate mask are fed to DNN. DNN finds out the correct spectral structure in the frames associated with the target speech which are further used to develop the mono channel mask. Speech signal is reconstructed using mono channel mask. Mono channel mask avoids the unnecessary interference from the noisy time–frequency (T–F) units. Objective evaluations done using perceptual evaluation of speech quality (PESQ) and normalized source to distortion ratio indicate that the proposed method outperforms the state of the art methods in the area of speech enhancement. Obtained values of PESQ shows that proposed method improves the quality of the speech in noisy conditions. The experimental results present the effectiveness of the mono channel mask in speech enhancement. The proposed method gives better performance compared to other methods.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10772-019-09627-4</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0002-1010-2022</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1381-2416
ispartof International journal of speech technology, 2019-09, Vol.22 (3), p.841-850
issn 1381-2416
1572-8110
language eng
recordid cdi_proquest_journals_2288804869
source SpringerNature Journals
subjects Algorithms
Artificial Intelligence
Artificial neural networks
Deep learning
Engineering
Neural networks
Signal,Image and Speech Processing
Social Sciences
Speech
Speech enhancement
Speech perception
Speech processing
title Deep neural network based speech enhancement using mono channel mask
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T22%3A41%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20neural%20network%20based%20speech%20enhancement%20using%20mono%20channel%20mask&rft.jtitle=International%20journal%20of%20speech%20technology&rft.au=Ingale,%20Pallavi%20P.&rft.date=2019-09-01&rft.volume=22&rft.issue=3&rft.spage=841&rft.epage=850&rft.pages=841-850&rft.issn=1381-2416&rft.eissn=1572-8110&rft_id=info:doi/10.1007/s10772-019-09627-4&rft_dat=%3Cproquest_cross%3E2288804869%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2288804869&rft_id=info:pmid/&rfr_iscdi=true