Learning Spectral Mapping for Speech Dereverberation and Denoising
In real-world environments, human speech is usually distorted by both reverberation and background noise, which have negative effects on speech intelligibility and speech quality. They also cause performance degradation in many speech technology applications, such as automatic speech recognition. Th...
Gespeichert in:
Veröffentlicht in: | IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2015-06, Vol.23 (6), p.982-992 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 992 |
---|---|
container_issue | 6 |
container_start_page | 982 |
container_title | IEEE/ACM transactions on audio, speech, and language processing |
container_volume | 23 |
creator | Kun Han Yuxuan Wang DeLiang Wang Woods, William S. Merks, Ivo Tao Zhang |
description | In real-world environments, human speech is usually distorted by both reverberation and background noise, which have negative effects on speech intelligibility and speech quality. They also cause performance degradation in many speech technology applications, such as automatic speech recognition. Therefore, the dereverberation and denoising problems must be dealt with in daily listening environments. In this paper, we propose to perform speech dereverberation using supervised learning, and the supervised approach is then extended to address both dereverberation and denoising. Deep neural networks are trained to directly learn a spectral mapping from the magnitude spectrogram of corrupted speech to that of clean speech. The proposed approach substantially attenuates the distortion caused by reverberation, as well as background noise, and is conceptually simple. Systematic experiments show that the proposed approach leads to significant improvements of predicted speech intelligibility and quality, as well as automatic speech recognition in reverberant noisy conditions. Comparisons show that our approach substantially outperforms related methods. |
doi_str_mv | 10.1109/TASLP.2015.2416653 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TASLP_2015_2416653</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7067387</ieee_id><sourcerecordid>3759915841</sourcerecordid><originalsourceid>FETCH-LOGICAL-c361t-7e487865f665e0879fd70765a406db1cd53bcfa9159d68d3827e704d270bce173</originalsourceid><addsrcrecordid>eNo9kM1OwzAQhC0EElXpC8AlEueUtZ1442Mpv1IQSC1ny4k3kKokwU6ReHsSWjjtajSzO_oYO-cw5xz01Xqxyl_mAng6FwlXKpVHbCKk0LGWkBz_7ULDKZuFsAEADqg1JhN2nZP1Td28RauOyt7bbfRku24UqtaPIpXv0Q15-iJfkLd93TaRbdygNW0dBuMZO6nsNtDsMKfs9e52vXyI8-f7x-Uij0upeB8jJRlmKq2GggQZ6sohoEptAsoVvHSpLMrKap5qpzInM4GEkDiBUJTEUU7Z5f5u59vPHYXebNqdb4aXhiutBQeOenCJvav0bQieKtP5-sP6b8PBjLjMLy4z4jIHXEPoYh-qieg_gKBQZih_AM9HZRI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1699210179</pqid></control><display><type>article</type><title>Learning Spectral Mapping for Speech Dereverberation and Denoising</title><source>IEEE Electronic Library (IEL)</source><creator>Kun Han ; Yuxuan Wang ; DeLiang Wang ; Woods, William S. ; Merks, Ivo ; Tao Zhang</creator><creatorcontrib>Kun Han ; Yuxuan Wang ; DeLiang Wang ; Woods, William S. ; Merks, Ivo ; Tao Zhang</creatorcontrib><description>In real-world environments, human speech is usually distorted by both reverberation and background noise, which have negative effects on speech intelligibility and speech quality. They also cause performance degradation in many speech technology applications, such as automatic speech recognition. Therefore, the dereverberation and denoising problems must be dealt with in daily listening environments. In this paper, we propose to perform speech dereverberation using supervised learning, and the supervised approach is then extended to address both dereverberation and denoising. Deep neural networks are trained to directly learn a spectral mapping from the magnitude spectrogram of corrupted speech to that of clean speech. The proposed approach substantially attenuates the distortion caused by reverberation, as well as background noise, and is conceptually simple. Systematic experiments show that the proposed approach leads to significant improvements of predicted speech intelligibility and quality, as well as automatic speech recognition in reverberant noisy conditions. Comparisons show that our approach substantially outperforms related methods.</description><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASLP.2015.2416653</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Deep neural networks (DNNs) ; denoising ; dereverberation ; Noise reduction ; Reverberation ; spectral mapping ; Spectrogram ; Speech ; Speech processing ; supervised learning ; Time-domain analysis ; Training ; Voice recognition ; Voice response technology</subject><ispartof>IEEE/ACM transactions on audio, speech, and language processing, 2015-06, Vol.23 (6), p.982-992</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Jun 2015</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c361t-7e487865f665e0879fd70765a406db1cd53bcfa9159d68d3827e704d270bce173</citedby><cites>FETCH-LOGICAL-c361t-7e487865f665e0879fd70765a406db1cd53bcfa9159d68d3827e704d270bce173</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7067387$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27923,27924,54757</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7067387$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Kun Han</creatorcontrib><creatorcontrib>Yuxuan Wang</creatorcontrib><creatorcontrib>DeLiang Wang</creatorcontrib><creatorcontrib>Woods, William S.</creatorcontrib><creatorcontrib>Merks, Ivo</creatorcontrib><creatorcontrib>Tao Zhang</creatorcontrib><title>Learning Spectral Mapping for Speech Dereverberation and Denoising</title><title>IEEE/ACM transactions on audio, speech, and language processing</title><addtitle>TASLP</addtitle><description>In real-world environments, human speech is usually distorted by both reverberation and background noise, which have negative effects on speech intelligibility and speech quality. They also cause performance degradation in many speech technology applications, such as automatic speech recognition. Therefore, the dereverberation and denoising problems must be dealt with in daily listening environments. In this paper, we propose to perform speech dereverberation using supervised learning, and the supervised approach is then extended to address both dereverberation and denoising. Deep neural networks are trained to directly learn a spectral mapping from the magnitude spectrogram of corrupted speech to that of clean speech. The proposed approach substantially attenuates the distortion caused by reverberation, as well as background noise, and is conceptually simple. Systematic experiments show that the proposed approach leads to significant improvements of predicted speech intelligibility and quality, as well as automatic speech recognition in reverberant noisy conditions. Comparisons show that our approach substantially outperforms related methods.</description><subject>Deep neural networks (DNNs)</subject><subject>denoising</subject><subject>dereverberation</subject><subject>Noise reduction</subject><subject>Reverberation</subject><subject>spectral mapping</subject><subject>Spectrogram</subject><subject>Speech</subject><subject>Speech processing</subject><subject>supervised learning</subject><subject>Time-domain analysis</subject><subject>Training</subject><subject>Voice recognition</subject><subject>Voice response technology</subject><issn>2329-9290</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kM1OwzAQhC0EElXpC8AlEueUtZ1442Mpv1IQSC1ny4k3kKokwU6ReHsSWjjtajSzO_oYO-cw5xz01Xqxyl_mAng6FwlXKpVHbCKk0LGWkBz_7ULDKZuFsAEADqg1JhN2nZP1Td28RauOyt7bbfRku24UqtaPIpXv0Q15-iJfkLd93TaRbdygNW0dBuMZO6nsNtDsMKfs9e52vXyI8-f7x-Uij0upeB8jJRlmKq2GggQZ6sohoEptAsoVvHSpLMrKap5qpzInM4GEkDiBUJTEUU7Z5f5u59vPHYXebNqdb4aXhiutBQeOenCJvav0bQieKtP5-sP6b8PBjLjMLy4z4jIHXEPoYh-qieg_gKBQZih_AM9HZRI</recordid><startdate>201506</startdate><enddate>201506</enddate><creator>Kun Han</creator><creator>Yuxuan Wang</creator><creator>DeLiang Wang</creator><creator>Woods, William S.</creator><creator>Merks, Ivo</creator><creator>Tao Zhang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>201506</creationdate><title>Learning Spectral Mapping for Speech Dereverberation and Denoising</title><author>Kun Han ; Yuxuan Wang ; DeLiang Wang ; Woods, William S. ; Merks, Ivo ; Tao Zhang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c361t-7e487865f665e0879fd70765a406db1cd53bcfa9159d68d3827e704d270bce173</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Deep neural networks (DNNs)</topic><topic>denoising</topic><topic>dereverberation</topic><topic>Noise reduction</topic><topic>Reverberation</topic><topic>spectral mapping</topic><topic>Spectrogram</topic><topic>Speech</topic><topic>Speech processing</topic><topic>supervised learning</topic><topic>Time-domain analysis</topic><topic>Training</topic><topic>Voice recognition</topic><topic>Voice response technology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kun Han</creatorcontrib><creatorcontrib>Yuxuan Wang</creatorcontrib><creatorcontrib>DeLiang Wang</creatorcontrib><creatorcontrib>Woods, William S.</creatorcontrib><creatorcontrib>Merks, Ivo</creatorcontrib><creatorcontrib>Tao Zhang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kun Han</au><au>Yuxuan Wang</au><au>DeLiang Wang</au><au>Woods, William S.</au><au>Merks, Ivo</au><au>Tao Zhang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning Spectral Mapping for Speech Dereverberation and Denoising</atitle><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle><stitle>TASLP</stitle><date>2015-06</date><risdate>2015</risdate><volume>23</volume><issue>6</issue><spage>982</spage><epage>992</epage><pages>982-992</pages><issn>2329-9290</issn><eissn>2329-9304</eissn><coden>ITASD8</coden><abstract>In real-world environments, human speech is usually distorted by both reverberation and background noise, which have negative effects on speech intelligibility and speech quality. They also cause performance degradation in many speech technology applications, such as automatic speech recognition. Therefore, the dereverberation and denoising problems must be dealt with in daily listening environments. In this paper, we propose to perform speech dereverberation using supervised learning, and the supervised approach is then extended to address both dereverberation and denoising. Deep neural networks are trained to directly learn a spectral mapping from the magnitude spectrogram of corrupted speech to that of clean speech. The proposed approach substantially attenuates the distortion caused by reverberation, as well as background noise, and is conceptually simple. Systematic experiments show that the proposed approach leads to significant improvements of predicted speech intelligibility and quality, as well as automatic speech recognition in reverberant noisy conditions. Comparisons show that our approach substantially outperforms related methods.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TASLP.2015.2416653</doi><tpages>11</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 2329-9290 |
ispartof | IEEE/ACM transactions on audio, speech, and language processing, 2015-06, Vol.23 (6), p.982-992 |
issn | 2329-9290 2329-9304 |
language | eng |
recordid | cdi_crossref_primary_10_1109_TASLP_2015_2416653 |
source | IEEE Electronic Library (IEL) |
subjects | Deep neural networks (DNNs) denoising dereverberation Noise reduction Reverberation spectral mapping Spectrogram Speech Speech processing supervised learning Time-domain analysis Training Voice recognition Voice response technology |
title | Learning Spectral Mapping for Speech Dereverberation and Denoising |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T19%3A56%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20Spectral%20Mapping%20for%20Speech%20Dereverberation%20and%20Denoising&rft.jtitle=IEEE/ACM%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Kun%20Han&rft.date=2015-06&rft.volume=23&rft.issue=6&rft.spage=982&rft.epage=992&rft.pages=982-992&rft.issn=2329-9290&rft.eissn=2329-9304&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASLP.2015.2416653&rft_dat=%3Cproquest_RIE%3E3759915841%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1699210179&rft_id=info:pmid/&rft_ieee_id=7067387&rfr_iscdi=true |