Learning Spectral Mapping for Speech Dereverberation and Denoising

In real-world environments, human speech is usually distorted by both reverberation and background noise, which have negative effects on speech intelligibility and speech quality. They also cause performance degradation in many speech technology applications, such as automatic speech recognition. Th...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2015-06, Vol.23 (6), p.982-992
Hauptverfasser:	Kun Han, Yuxuan Wang, DeLiang Wang, Woods, William S., Merks, Ivo, Tao Zhang
Format:	Artikel
Sprache:	eng
Schlagworte:	Deep neural networks (DNNs) denoising dereverberation Noise reduction Reverberation spectral mapping Spectrogram Speech Speech processing supervised learning Time-domain analysis Training Voice recognition Voice response technology
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	992
container_issue	6
container_start_page	982
container_title	IEEE/ACM transactions on audio, speech, and language processing
container_volume	23
creator	Kun Han Yuxuan Wang DeLiang Wang Woods, William S. Merks, Ivo Tao Zhang
description	In real-world environments, human speech is usually distorted by both reverberation and background noise, which have negative effects on speech intelligibility and speech quality. They also cause performance degradation in many speech technology applications, such as automatic speech recognition. Therefore, the dereverberation and denoising problems must be dealt with in daily listening environments. In this paper, we propose to perform speech dereverberation using supervised learning, and the supervised approach is then extended to address both dereverberation and denoising. Deep neural networks are trained to directly learn a spectral mapping from the magnitude spectrogram of corrupted speech to that of clean speech. The proposed approach substantially attenuates the distortion caused by reverberation, as well as background noise, and is conceptually simple. Systematic experiments show that the proposed approach leads to significant improvements of predicted speech intelligibility and quality, as well as automatic speech recognition in reverberant noisy conditions. Comparisons show that our approach substantially outperforms related methods.
doi_str_mv	10.1109/TASLP.2015.2416653
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TASLP_2015_2416653</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7067387</ieee_id><sourcerecordid>3759915841</sourcerecordid><originalsourceid>FETCH-LOGICAL-c361t-7e487865f665e0879fd70765a406db1cd53bcfa9159d68d3827e704d270bce173</originalsourceid><addsrcrecordid>eNo9kM1OwzAQhC0EElXpC8AlEueUtZ1442Mpv1IQSC1ny4k3kKokwU6ReHsSWjjtajSzO_oYO-cw5xz01Xqxyl_mAng6FwlXKpVHbCKk0LGWkBz_7ULDKZuFsAEADqg1JhN2nZP1Td28RauOyt7bbfRku24UqtaPIpXv0Q15-iJfkLd93TaRbdygNW0dBuMZO6nsNtDsMKfs9e52vXyI8-f7x-Uij0upeB8jJRlmKq2GggQZ6sohoEptAsoVvHSpLMrKap5qpzInM4GEkDiBUJTEUU7Z5f5u59vPHYXebNqdb4aXhiutBQeOenCJvav0bQieKtP5-sP6b8PBjLjMLy4z4jIHXEPoYh-qieg_gKBQZih_AM9HZRI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1699210179</pqid></control><display><type>article</type><title>Learning Spectral Mapping for Speech Dereverberation and Denoising</title><source>IEEE Electronic Library (IEL)</source><creator>Kun Han ; Yuxuan Wang ; DeLiang Wang ; Woods, William S. ; Merks, Ivo ; Tao Zhang</creator><creatorcontrib>Kun Han ; Yuxuan Wang ; DeLiang Wang ; Woods, William S. ; Merks, Ivo ; Tao Zhang</creatorcontrib><description>In real-world environments, human speech is usually distorted by both reverberation and background noise, which have negative effects on speech intelligibility and speech quality. They also cause performance degradation in many speech technology applications, such as automatic speech recognition. Therefore, the dereverberation and denoising problems must be dealt with in daily listening environments. In this paper, we propose to perform speech dereverberation using supervised learning, and the supervised approach is then extended to address both dereverberation and denoising. Deep neural networks are trained to directly learn a spectral mapping from the magnitude spectrogram of corrupted speech to that of clean speech. The proposed approach substantially attenuates the distortion caused by reverberation, as well as background noise, and is conceptually simple. Systematic experiments show that the proposed approach leads to significant improvements of predicted speech intelligibility and quality, as well as automatic speech recognition in reverberant noisy conditions. Comparisons show that our approach substantially outperforms related methods.</description><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASLP.2015.2416653</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Deep neural networks (DNNs) ; denoising ; dereverberation ; Noise reduction ; Reverberation ; spectral mapping ; Spectrogram ; Speech ; Speech processing ; supervised learning ; Time-domain analysis ; Training ; Voice recognition ; Voice response technology</subject><ispartof>IEEE/ACM transactions on audio, speech, and language processing, 2015-06, Vol.23 (6), p.982-992</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Jun 2015</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c361t-7e487865f665e0879fd70765a406db1cd53bcfa9159d68d3827e704d270bce173</citedby><cites>FETCH-LOGICAL-c361t-7e487865f665e0879fd70765a406db1cd53bcfa9159d68d3827e704d270bce173</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7067387$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27923,27924,54757</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7067387$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Kun Han</creatorcontrib><creatorcontrib>Yuxuan Wang</creatorcontrib><creatorcontrib>DeLiang Wang</creatorcontrib><creatorcontrib>Woods, William S.</creatorcontrib><creatorcontrib>Merks, Ivo</creatorcontrib><creatorcontrib>Tao Zhang</creatorcontrib><title>Learning Spectral Mapping for Speech Dereverberation and Denoising</title><title>IEEE/ACM transactions on audio, speech, and language processing</title><addtitle>TASLP</addtitle><description>In real-world environments, human speech is usually distorted by both reverberation and background noise, which have negative effects on speech intelligibility and speech quality. They also cause performance degradation in many speech technology applications, such as automatic speech recognition. Therefore, the dereverberation and denoising problems must be dealt with in daily listening environments. In this paper, we propose to perform speech dereverberation using supervised learning, and the supervised approach is then extended to address both dereverberation and denoising. Deep neural networks are trained to directly learn a spectral mapping from the magnitude spectrogram of corrupted speech to that of clean speech. The proposed approach substantially attenuates the distortion caused by reverberation, as well as background noise, and is conceptually simple. Systematic experiments show that the proposed approach leads to significant improvements of predicted speech intelligibility and quality, as well as automatic speech recognition in reverberant noisy conditions. Comparisons show that our approach substantially outperforms related methods.</description><subject>Deep neural networks (DNNs)</subject><subject>denoising</subject><subject>dereverberation</subject><subject>Noise reduction</subject><subject>Reverberation</subject><subject>spectral mapping</subject><subject>Spectrogram</subject><subject>Speech</subject><subject>Speech processing</subject><subject>supervised learning</subject><subject>Time-domain analysis</subject><subject>Training</subject><subject>Voice recognition</subject><subject>Voice response technology</subject><issn>2329-9290</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kM1OwzAQhC0EElXpC8AlEueUtZ1442Mpv1IQSC1ny4k3kKokwU6ReHsSWjjtajSzO_oYO-cw5xz01Xqxyl_mAng6FwlXKpVHbCKk0LGWkBz_7ULDKZuFsAEADqg1JhN2nZP1Td28RauOyt7bbfRku24UqtaPIpXv0Q15-iJfkLd93TaRbdygNW0dBuMZO6nsNtDsMKfs9e52vXyI8-f7x-Uij0upeB8jJRlmKq2GggQZ6sohoEptAsoVvHSpLMrKap5qpzInM4GEkDiBUJTEUU7Z5f5u59vPHYXebNqdb4aXhiutBQeOenCJvav0bQieKtP5-sP6b8PBjLjMLy4z4jIHXEPoYh-qieg_gKBQZih_AM9HZRI</recordid><startdate>201506</startdate><enddate>201506</enddate><creator>Kun Han</creator><creator>Yuxuan Wang</creator><creator>DeLiang Wang</creator><creator>Woods, William S.</creator><creator>Merks, Ivo</creator><creator>Tao Zhang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>201506</creationdate><title>Learning Spectral Mapping for Speech Dereverberation and Denoising</title><author>Kun Han ; Yuxuan Wang ; DeLiang Wang ; Woods, William S. ; Merks, Ivo ; Tao Zhang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c361t-7e487865f665e0879fd70765a406db1cd53bcfa9159d68d3827e704d270bce173</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Deep neural networks (DNNs)</topic><topic>denoising</topic><topic>dereverberation</topic><topic>Noise reduction</topic><topic>Reverberation</topic><topic>spectral mapping</topic><topic>Spectrogram</topic><topic>Speech</topic><topic>Speech processing</topic><topic>supervised learning</topic><topic>Time-domain analysis</topic><topic>Training</topic><topic>Voice recognition</topic><topic>Voice response technology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kun Han</creatorcontrib><creatorcontrib>Yuxuan Wang</creatorcontrib><creatorcontrib>DeLiang Wang</creatorcontrib><creatorcontrib>Woods, William S.</creatorcontrib><creatorcontrib>Merks, Ivo</creatorcontrib><creatorcontrib>Tao Zhang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kun Han</au><au>Yuxuan Wang</au><au>DeLiang Wang</au><au>Woods, William S.</au><au>Merks, Ivo</au><au>Tao Zhang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning Spectral Mapping for Speech Dereverberation and Denoising</atitle><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle><stitle>TASLP</stitle><date>2015-06</date><risdate>2015</risdate><volume>23</volume><issue>6</issue><spage>982</spage><epage>992</epage><pages>982-992</pages><issn>2329-9290</issn><eissn>2329-9304</eissn><coden>ITASD8</coden><abstract>In real-world environments, human speech is usually distorted by both reverberation and background noise, which have negative effects on speech intelligibility and speech quality. They also cause performance degradation in many speech technology applications, such as automatic speech recognition. Therefore, the dereverberation and denoising problems must be dealt with in daily listening environments. In this paper, we propose to perform speech dereverberation using supervised learning, and the supervised approach is then extended to address both dereverberation and denoising. Deep neural networks are trained to directly learn a spectral mapping from the magnitude spectrogram of corrupted speech to that of clean speech. The proposed approach substantially attenuates the distortion caused by reverberation, as well as background noise, and is conceptually simple. Systematic experiments show that the proposed approach leads to significant improvements of predicted speech intelligibility and quality, as well as automatic speech recognition in reverberant noisy conditions. Comparisons show that our approach substantially outperforms related methods.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TASLP.2015.2416653</doi><tpages>11</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 2329-9290
ispartof	IEEE/ACM transactions on audio, speech, and language processing, 2015-06, Vol.23 (6), p.982-992
issn	2329-9290 2329-9304
language	eng
recordid	cdi_crossref_primary_10_1109_TASLP_2015_2416653
source	IEEE Electronic Library (IEL)
subjects	Deep neural networks (DNNs) denoising dereverberation Noise reduction Reverberation spectral mapping Spectrogram Speech Speech processing supervised learning Time-domain analysis Training Voice recognition Voice response technology
title	Learning Spectral Mapping for Speech Dereverberation and Denoising
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T19%3A56%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20Spectral%20Mapping%20for%20Speech%20Dereverberation%20and%20Denoising&rft.jtitle=IEEE/ACM%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Kun%20Han&rft.date=2015-06&rft.volume=23&rft.issue=6&rft.spage=982&rft.epage=992&rft.pages=982-992&rft.issn=2329-9290&rft.eissn=2329-9304&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASLP.2015.2416653&rft_dat=%3Cproquest_RIE%3E3759915841%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1699210179&rft_id=info:pmid/&rft_ieee_id=7067387&rfr_iscdi=true