Learning Spectral Mapping for Speech Dereverberation and Denoising

In real-world environments, human speech is usually distorted by both reverberation and background noise, which have negative effects on speech intelligibility and speech quality. They also cause performance degradation in many speech technology applications, such as automatic speech recognition. Th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2015-06, Vol.23 (6), p.982-992
Hauptverfasser: Kun Han, Yuxuan Wang, DeLiang Wang, Woods, William S., Merks, Ivo, Tao Zhang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 992
container_issue 6
container_start_page 982
container_title IEEE/ACM transactions on audio, speech, and language processing
container_volume 23
creator Kun Han
Yuxuan Wang
DeLiang Wang
Woods, William S.
Merks, Ivo
Tao Zhang
description In real-world environments, human speech is usually distorted by both reverberation and background noise, which have negative effects on speech intelligibility and speech quality. They also cause performance degradation in many speech technology applications, such as automatic speech recognition. Therefore, the dereverberation and denoising problems must be dealt with in daily listening environments. In this paper, we propose to perform speech dereverberation using supervised learning, and the supervised approach is then extended to address both dereverberation and denoising. Deep neural networks are trained to directly learn a spectral mapping from the magnitude spectrogram of corrupted speech to that of clean speech. The proposed approach substantially attenuates the distortion caused by reverberation, as well as background noise, and is conceptually simple. Systematic experiments show that the proposed approach leads to significant improvements of predicted speech intelligibility and quality, as well as automatic speech recognition in reverberant noisy conditions. Comparisons show that our approach substantially outperforms related methods.
doi_str_mv 10.1109/TASLP.2015.2416653
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TASLP_2015_2416653</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7067387</ieee_id><sourcerecordid>3759915841</sourcerecordid><originalsourceid>FETCH-LOGICAL-c361t-7e487865f665e0879fd70765a406db1cd53bcfa9159d68d3827e704d270bce173</originalsourceid><addsrcrecordid>eNo9kM1OwzAQhC0EElXpC8AlEueUtZ1442Mpv1IQSC1ny4k3kKokwU6ReHsSWjjtajSzO_oYO-cw5xz01Xqxyl_mAng6FwlXKpVHbCKk0LGWkBz_7ULDKZuFsAEADqg1JhN2nZP1Td28RauOyt7bbfRku24UqtaPIpXv0Q15-iJfkLd93TaRbdygNW0dBuMZO6nsNtDsMKfs9e52vXyI8-f7x-Uij0upeB8jJRlmKq2GggQZ6sohoEptAsoVvHSpLMrKap5qpzInM4GEkDiBUJTEUU7Z5f5u59vPHYXebNqdb4aXhiutBQeOenCJvav0bQieKtP5-sP6b8PBjLjMLy4z4jIHXEPoYh-qieg_gKBQZih_AM9HZRI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1699210179</pqid></control><display><type>article</type><title>Learning Spectral Mapping for Speech Dereverberation and Denoising</title><source>IEEE Electronic Library (IEL)</source><creator>Kun Han ; Yuxuan Wang ; DeLiang Wang ; Woods, William S. ; Merks, Ivo ; Tao Zhang</creator><creatorcontrib>Kun Han ; Yuxuan Wang ; DeLiang Wang ; Woods, William S. ; Merks, Ivo ; Tao Zhang</creatorcontrib><description>In real-world environments, human speech is usually distorted by both reverberation and background noise, which have negative effects on speech intelligibility and speech quality. They also cause performance degradation in many speech technology applications, such as automatic speech recognition. Therefore, the dereverberation and denoising problems must be dealt with in daily listening environments. In this paper, we propose to perform speech dereverberation using supervised learning, and the supervised approach is then extended to address both dereverberation and denoising. Deep neural networks are trained to directly learn a spectral mapping from the magnitude spectrogram of corrupted speech to that of clean speech. The proposed approach substantially attenuates the distortion caused by reverberation, as well as background noise, and is conceptually simple. Systematic experiments show that the proposed approach leads to significant improvements of predicted speech intelligibility and quality, as well as automatic speech recognition in reverberant noisy conditions. Comparisons show that our approach substantially outperforms related methods.</description><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASLP.2015.2416653</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Deep neural networks (DNNs) ; denoising ; dereverberation ; Noise reduction ; Reverberation ; spectral mapping ; Spectrogram ; Speech ; Speech processing ; supervised learning ; Time-domain analysis ; Training ; Voice recognition ; Voice response technology</subject><ispartof>IEEE/ACM transactions on audio, speech, and language processing, 2015-06, Vol.23 (6), p.982-992</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Jun 2015</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c361t-7e487865f665e0879fd70765a406db1cd53bcfa9159d68d3827e704d270bce173</citedby><cites>FETCH-LOGICAL-c361t-7e487865f665e0879fd70765a406db1cd53bcfa9159d68d3827e704d270bce173</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7067387$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27923,27924,54757</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7067387$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Kun Han</creatorcontrib><creatorcontrib>Yuxuan Wang</creatorcontrib><creatorcontrib>DeLiang Wang</creatorcontrib><creatorcontrib>Woods, William S.</creatorcontrib><creatorcontrib>Merks, Ivo</creatorcontrib><creatorcontrib>Tao Zhang</creatorcontrib><title>Learning Spectral Mapping for Speech Dereverberation and Denoising</title><title>IEEE/ACM transactions on audio, speech, and language processing</title><addtitle>TASLP</addtitle><description>In real-world environments, human speech is usually distorted by both reverberation and background noise, which have negative effects on speech intelligibility and speech quality. They also cause performance degradation in many speech technology applications, such as automatic speech recognition. Therefore, the dereverberation and denoising problems must be dealt with in daily listening environments. In this paper, we propose to perform speech dereverberation using supervised learning, and the supervised approach is then extended to address both dereverberation and denoising. Deep neural networks are trained to directly learn a spectral mapping from the magnitude spectrogram of corrupted speech to that of clean speech. The proposed approach substantially attenuates the distortion caused by reverberation, as well as background noise, and is conceptually simple. Systematic experiments show that the proposed approach leads to significant improvements of predicted speech intelligibility and quality, as well as automatic speech recognition in reverberant noisy conditions. Comparisons show that our approach substantially outperforms related methods.</description><subject>Deep neural networks (DNNs)</subject><subject>denoising</subject><subject>dereverberation</subject><subject>Noise reduction</subject><subject>Reverberation</subject><subject>spectral mapping</subject><subject>Spectrogram</subject><subject>Speech</subject><subject>Speech processing</subject><subject>supervised learning</subject><subject>Time-domain analysis</subject><subject>Training</subject><subject>Voice recognition</subject><subject>Voice response technology</subject><issn>2329-9290</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kM1OwzAQhC0EElXpC8AlEueUtZ1442Mpv1IQSC1ny4k3kKokwU6ReHsSWjjtajSzO_oYO-cw5xz01Xqxyl_mAng6FwlXKpVHbCKk0LGWkBz_7ULDKZuFsAEADqg1JhN2nZP1Td28RauOyt7bbfRku24UqtaPIpXv0Q15-iJfkLd93TaRbdygNW0dBuMZO6nsNtDsMKfs9e52vXyI8-f7x-Uij0upeB8jJRlmKq2GggQZ6sohoEptAsoVvHSpLMrKap5qpzInM4GEkDiBUJTEUU7Z5f5u59vPHYXebNqdb4aXhiutBQeOenCJvav0bQieKtP5-sP6b8PBjLjMLy4z4jIHXEPoYh-qieg_gKBQZih_AM9HZRI</recordid><startdate>201506</startdate><enddate>201506</enddate><creator>Kun Han</creator><creator>Yuxuan Wang</creator><creator>DeLiang Wang</creator><creator>Woods, William S.</creator><creator>Merks, Ivo</creator><creator>Tao Zhang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>201506</creationdate><title>Learning Spectral Mapping for Speech Dereverberation and Denoising</title><author>Kun Han ; Yuxuan Wang ; DeLiang Wang ; Woods, William S. ; Merks, Ivo ; Tao Zhang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c361t-7e487865f665e0879fd70765a406db1cd53bcfa9159d68d3827e704d270bce173</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Deep neural networks (DNNs)</topic><topic>denoising</topic><topic>dereverberation</topic><topic>Noise reduction</topic><topic>Reverberation</topic><topic>spectral mapping</topic><topic>Spectrogram</topic><topic>Speech</topic><topic>Speech processing</topic><topic>supervised learning</topic><topic>Time-domain analysis</topic><topic>Training</topic><topic>Voice recognition</topic><topic>Voice response technology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kun Han</creatorcontrib><creatorcontrib>Yuxuan Wang</creatorcontrib><creatorcontrib>DeLiang Wang</creatorcontrib><creatorcontrib>Woods, William S.</creatorcontrib><creatorcontrib>Merks, Ivo</creatorcontrib><creatorcontrib>Tao Zhang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kun Han</au><au>Yuxuan Wang</au><au>DeLiang Wang</au><au>Woods, William S.</au><au>Merks, Ivo</au><au>Tao Zhang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning Spectral Mapping for Speech Dereverberation and Denoising</atitle><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle><stitle>TASLP</stitle><date>2015-06</date><risdate>2015</risdate><volume>23</volume><issue>6</issue><spage>982</spage><epage>992</epage><pages>982-992</pages><issn>2329-9290</issn><eissn>2329-9304</eissn><coden>ITASD8</coden><abstract>In real-world environments, human speech is usually distorted by both reverberation and background noise, which have negative effects on speech intelligibility and speech quality. They also cause performance degradation in many speech technology applications, such as automatic speech recognition. Therefore, the dereverberation and denoising problems must be dealt with in daily listening environments. In this paper, we propose to perform speech dereverberation using supervised learning, and the supervised approach is then extended to address both dereverberation and denoising. Deep neural networks are trained to directly learn a spectral mapping from the magnitude spectrogram of corrupted speech to that of clean speech. The proposed approach substantially attenuates the distortion caused by reverberation, as well as background noise, and is conceptually simple. Systematic experiments show that the proposed approach leads to significant improvements of predicted speech intelligibility and quality, as well as automatic speech recognition in reverberant noisy conditions. Comparisons show that our approach substantially outperforms related methods.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TASLP.2015.2416653</doi><tpages>11</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 2329-9290
ispartof IEEE/ACM transactions on audio, speech, and language processing, 2015-06, Vol.23 (6), p.982-992
issn 2329-9290
2329-9304
language eng
recordid cdi_crossref_primary_10_1109_TASLP_2015_2416653
source IEEE Electronic Library (IEL)
subjects Deep neural networks (DNNs)
denoising
dereverberation
Noise reduction
Reverberation
spectral mapping
Spectrogram
Speech
Speech processing
supervised learning
Time-domain analysis
Training
Voice recognition
Voice response technology
title Learning Spectral Mapping for Speech Dereverberation and Denoising
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T19%3A56%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20Spectral%20Mapping%20for%20Speech%20Dereverberation%20and%20Denoising&rft.jtitle=IEEE/ACM%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Kun%20Han&rft.date=2015-06&rft.volume=23&rft.issue=6&rft.spage=982&rft.epage=992&rft.pages=982-992&rft.issn=2329-9290&rft.eissn=2329-9304&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASLP.2015.2416653&rft_dat=%3Cproquest_RIE%3E3759915841%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1699210179&rft_id=info:pmid/&rft_ieee_id=7067387&rfr_iscdi=true