Deep neural network based speech enhancement using mono channel mask

Getting enhanced speech from the noisy speech signal is a task of particular importance in the area of speech processing. Here we propose a deep neural network (DNN) based speech enhancement method utilising mono channel mask. The proposed method employs cochleagram to find an initial binary mask. T...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of speech technology 2019-09, Vol.22 (3), p.841-850
Hauptverfasser:	Ingale, Pallavi P., Nalbalwar, Sanjay L.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial Intelligence Artificial neural networks Deep learning Engineering Neural networks Signal,Image and Speech Processing Social Sciences Speech Speech enhancement Speech perception Speech processing
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	850
container_issue	3
container_start_page	841
container_title	International journal of speech technology
container_volume	22
creator	Ingale, Pallavi P. Nalbalwar, Sanjay L.
description	Getting enhanced speech from the noisy speech signal is a task of particular importance in the area of speech processing. Here we propose a deep neural network (DNN) based speech enhancement method utilising mono channel mask. The proposed method employs cochleagram to find an initial binary mask. Then modified sub-harmonic summation algorithm is applied on initial binary mask to obtain an intermediate mask. The spectro-temporal features of this intermediate mask are fed to DNN. DNN finds out the correct spectral structure in the frames associated with the target speech which are further used to develop the mono channel mask. Speech signal is reconstructed using mono channel mask. Mono channel mask avoids the unnecessary interference from the noisy time–frequency (T–F) units. Objective evaluations done using perceptual evaluation of speech quality (PESQ) and normalized source to distortion ratio indicate that the proposed method outperforms the state of the art methods in the area of speech enhancement. Obtained values of PESQ shows that proposed method improves the quality of the speech in noisy conditions. The experimental results present the effectiveness of the mono channel mask in speech enhancement. The proposed method gives better performance compared to other methods.
doi_str_mv	10.1007/s10772-019-09627-4
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2288804869</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2288804869</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-46e38fa84f97a3958b69d9a2dcc213ffee90fb643942ac49c69fca70d4e89b523</originalsourceid><addsrcrecordid>eNp9kEtLxDAUhYMoOD7-gKuA62pe0-YuZcYXDLjRdUjTm3k2rUmL-O-NVnDn6lwO55wLHyFXnN1wxqrbxFlViYJxKBiUoirUEZnxebY05-w431LzQihenpKzlHaMMahAzMhyidjTgGO0hyzDRxf3tLYJG5p6RLehGDY2OGwxDHRM27CmbRc66rIb8EBbm_YX5MTbQ8LLXz0nbw_3r4unYvXy-Ly4WxVOchgKVaLU3mrlobIS5rouoQErGucEl94jAvN1qSQoYZ0CV4J3tmKNQg31XMhzcj3t9rF7HzENZteNMeSXRgitNVO6hJwSU8rFLqWI3vRx29r4aTgz37TMRMtkWuaHllG5JKdSyuGwxvg3_U_rCz6fbSE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2288804869</pqid></control><display><type>article</type><title>Deep neural network based speech enhancement using mono channel mask</title><source>SpringerNature Journals</source><creator>Ingale, Pallavi P. ; Nalbalwar, Sanjay L.</creator><creatorcontrib>Ingale, Pallavi P. ; Nalbalwar, Sanjay L.</creatorcontrib><description>Getting enhanced speech from the noisy speech signal is a task of particular importance in the area of speech processing. Here we propose a deep neural network (DNN) based speech enhancement method utilising mono channel mask. The proposed method employs cochleagram to find an initial binary mask. Then modified sub-harmonic summation algorithm is applied on initial binary mask to obtain an intermediate mask. The spectro-temporal features of this intermediate mask are fed to DNN. DNN finds out the correct spectral structure in the frames associated with the target speech which are further used to develop the mono channel mask. Speech signal is reconstructed using mono channel mask. Mono channel mask avoids the unnecessary interference from the noisy time–frequency (T–F) units. Objective evaluations done using perceptual evaluation of speech quality (PESQ) and normalized source to distortion ratio indicate that the proposed method outperforms the state of the art methods in the area of speech enhancement. Obtained values of PESQ shows that proposed method improves the quality of the speech in noisy conditions. The experimental results present the effectiveness of the mono channel mask in speech enhancement. The proposed method gives better performance compared to other methods.</description><identifier>ISSN: 1381-2416</identifier><identifier>EISSN: 1572-8110</identifier><identifier>DOI: 10.1007/s10772-019-09627-4</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Algorithms ; Artificial Intelligence ; Artificial neural networks ; Deep learning ; Engineering ; Neural networks ; Signal,Image and Speech Processing ; Social Sciences ; Speech ; Speech enhancement ; Speech perception ; Speech processing</subject><ispartof>International journal of speech technology, 2019-09, Vol.22 (3), p.841-850</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2019</rights><rights>Copyright Springer Nature B.V. 2019</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-46e38fa84f97a3958b69d9a2dcc213ffee90fb643942ac49c69fca70d4e89b523</citedby><cites>FETCH-LOGICAL-c319t-46e38fa84f97a3958b69d9a2dcc213ffee90fb643942ac49c69fca70d4e89b523</cites><orcidid>0000-0002-1010-2022</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10772-019-09627-4$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10772-019-09627-4$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Ingale, Pallavi P.</creatorcontrib><creatorcontrib>Nalbalwar, Sanjay L.</creatorcontrib><title>Deep neural network based speech enhancement using mono channel mask</title><title>International journal of speech technology</title><addtitle>Int J Speech Technol</addtitle><description>Getting enhanced speech from the noisy speech signal is a task of particular importance in the area of speech processing. Here we propose a deep neural network (DNN) based speech enhancement method utilising mono channel mask. The proposed method employs cochleagram to find an initial binary mask. Then modified sub-harmonic summation algorithm is applied on initial binary mask to obtain an intermediate mask. The spectro-temporal features of this intermediate mask are fed to DNN. DNN finds out the correct spectral structure in the frames associated with the target speech which are further used to develop the mono channel mask. Speech signal is reconstructed using mono channel mask. Mono channel mask avoids the unnecessary interference from the noisy time–frequency (T–F) units. Objective evaluations done using perceptual evaluation of speech quality (PESQ) and normalized source to distortion ratio indicate that the proposed method outperforms the state of the art methods in the area of speech enhancement. Obtained values of PESQ shows that proposed method improves the quality of the speech in noisy conditions. The experimental results present the effectiveness of the mono channel mask in speech enhancement. The proposed method gives better performance compared to other methods.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Artificial neural networks</subject><subject>Deep learning</subject><subject>Engineering</subject><subject>Neural networks</subject><subject>Signal,Image and Speech Processing</subject><subject>Social Sciences</subject><subject>Speech</subject><subject>Speech enhancement</subject><subject>Speech perception</subject><subject>Speech processing</subject><issn>1381-2416</issn><issn>1572-8110</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp9kEtLxDAUhYMoOD7-gKuA62pe0-YuZcYXDLjRdUjTm3k2rUmL-O-NVnDn6lwO55wLHyFXnN1wxqrbxFlViYJxKBiUoirUEZnxebY05-w431LzQihenpKzlHaMMahAzMhyidjTgGO0hyzDRxf3tLYJG5p6RLehGDY2OGwxDHRM27CmbRc66rIb8EBbm_YX5MTbQ8LLXz0nbw_3r4unYvXy-Ly4WxVOchgKVaLU3mrlobIS5rouoQErGucEl94jAvN1qSQoYZ0CV4J3tmKNQg31XMhzcj3t9rF7HzENZteNMeSXRgitNVO6hJwSU8rFLqWI3vRx29r4aTgz37TMRMtkWuaHllG5JKdSyuGwxvg3_U_rCz6fbSE</recordid><startdate>20190901</startdate><enddate>20190901</enddate><creator>Ingale, Pallavi P.</creator><creator>Nalbalwar, Sanjay L.</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7T9</scope><orcidid>https://orcid.org/0000-0002-1010-2022</orcidid></search><sort><creationdate>20190901</creationdate><title>Deep neural network based speech enhancement using mono channel mask</title><author>Ingale, Pallavi P. ; Nalbalwar, Sanjay L.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-46e38fa84f97a3958b69d9a2dcc213ffee90fb643942ac49c69fca70d4e89b523</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Artificial neural networks</topic><topic>Deep learning</topic><topic>Engineering</topic><topic>Neural networks</topic><topic>Signal,Image and Speech Processing</topic><topic>Social Sciences</topic><topic>Speech</topic><topic>Speech enhancement</topic><topic>Speech perception</topic><topic>Speech processing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ingale, Pallavi P.</creatorcontrib><creatorcontrib>Nalbalwar, Sanjay L.</creatorcontrib><collection>CrossRef</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><jtitle>International journal of speech technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ingale, Pallavi P.</au><au>Nalbalwar, Sanjay L.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep neural network based speech enhancement using mono channel mask</atitle><jtitle>International journal of speech technology</jtitle><stitle>Int J Speech Technol</stitle><date>2019-09-01</date><risdate>2019</risdate><volume>22</volume><issue>3</issue><spage>841</spage><epage>850</epage><pages>841-850</pages><issn>1381-2416</issn><eissn>1572-8110</eissn><abstract>Getting enhanced speech from the noisy speech signal is a task of particular importance in the area of speech processing. Here we propose a deep neural network (DNN) based speech enhancement method utilising mono channel mask. The proposed method employs cochleagram to find an initial binary mask. Then modified sub-harmonic summation algorithm is applied on initial binary mask to obtain an intermediate mask. The spectro-temporal features of this intermediate mask are fed to DNN. DNN finds out the correct spectral structure in the frames associated with the target speech which are further used to develop the mono channel mask. Speech signal is reconstructed using mono channel mask. Mono channel mask avoids the unnecessary interference from the noisy time–frequency (T–F) units. Objective evaluations done using perceptual evaluation of speech quality (PESQ) and normalized source to distortion ratio indicate that the proposed method outperforms the state of the art methods in the area of speech enhancement. Obtained values of PESQ shows that proposed method improves the quality of the speech in noisy conditions. The experimental results present the effectiveness of the mono channel mask in speech enhancement. The proposed method gives better performance compared to other methods.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10772-019-09627-4</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0002-1010-2022</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1381-2416
ispartof	International journal of speech technology, 2019-09, Vol.22 (3), p.841-850
issn	1381-2416 1572-8110
language	eng
recordid	cdi_proquest_journals_2288804869
source	SpringerNature Journals
subjects	Algorithms Artificial Intelligence Artificial neural networks Deep learning Engineering Neural networks Signal,Image and Speech Processing Social Sciences Speech Speech enhancement Speech perception Speech processing
title	Deep neural network based speech enhancement using mono channel mask
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T22%3A41%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20neural%20network%20based%20speech%20enhancement%20using%20mono%20channel%20mask&rft.jtitle=International%20journal%20of%20speech%20technology&rft.au=Ingale,%20Pallavi%20P.&rft.date=2019-09-01&rft.volume=22&rft.issue=3&rft.spage=841&rft.epage=850&rft.pages=841-850&rft.issn=1381-2416&rft.eissn=1572-8110&rft_id=info:doi/10.1007/s10772-019-09627-4&rft_dat=%3Cproquest_cross%3E2288804869%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2288804869&rft_id=info:pmid/&rfr_iscdi=true