Wavelet Speech Enhancement Based on Nonnegative Matrix Factorization

For the state-of-the-art speech enhancement (SE) techniques, a spectrogram is usually preferred than the respective time-domain raw data, since it reveals more compact presentation together with conspicuous temporal information over a long time span. However, two problems can cause distortions in th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE signal processing letters 2016-08, Vol.23 (8), p.1101-1105
Hauptverfasser: Syu-Siang Wang, Chern, Alan, Yu Tsao, Jeih-weih Hung, Xugang Lu, Ying-Hui Lai, Borching Su
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1105
container_issue 8
container_start_page 1101
container_title IEEE signal processing letters
container_volume 23
creator Syu-Siang Wang
Chern, Alan
Yu Tsao
Jeih-weih Hung
Xugang Lu
Ying-Hui Lai
Borching Su
description For the state-of-the-art speech enhancement (SE) techniques, a spectrogram is usually preferred than the respective time-domain raw data, since it reveals more compact presentation together with conspicuous temporal information over a long time span. However, two problems can cause distortions in the conventional nonnegative matrix factorization (NMF)-based SE algorithms. One is related to the overlap-and-add operation used in the short-time Fourier transform (STFT)-based signal reconstruction, and the other is concerned with directly using the phase of the noisy speech as that of the enhanced speech in signal reconstruction. These two problems can cause information loss or discontinuity when comparing the clean signal with the reconstructed signal. To solve these two problems, we propose a novel SE method that adopts discrete wavelet packet transform (DWPT) and NMF. In brief, the DWPT is first applied to split a time-domain speech signal into a series of subband signals. Then, we exploit NMF to highlight the speech component for each subband. These enhanced subband signals are joined together via the inverse DWPT to reconstruct a noise-reduced signal in time domain. We evaluate the proposed DWPT-NMF-based SE method on the Mandarin hearing in noise test (MHINT) task. Experimental results show that this new method effectively enhances speech quality and intelligibility and outperforms the conventional STFT-NMF-based SE system.
doi_str_mv 10.1109/LSP.2016.2571727
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_LSP_2016_2571727</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7476850</ieee_id><sourcerecordid>4118096571</sourcerecordid><originalsourceid>FETCH-LOGICAL-c324t-955e67d7010db8779c32dfd5a11c7e842009179ee93d56b56604e6efa4f1fac93</originalsourceid><addsrcrecordid>eNpdkE1Lw0AQhhdRsFbvgpeAFy-ps8l-HrW2KtQPqOJx2SYTm5Lu1mxa1F_vlooHTzO8PO8wPIScUhhQCvpyMn0eZEDFIOOSykzukR7lXKVZLuh-3EFCqjWoQ3IUwgIAFFW8R27e7AYb7JLpCrGYJyM3t67AJbouubYBy8S75NE7h--2qzeYPNiurT-TsS0639bfMfTumBxUtgl48jv75HU8ehnepZOn2_vh1SQt8ox1qeYchSwlUChnSkod47IquaW0kKhYBqCp1Ig6L7mYcSGAocDKsopWttB5n1zs7q5a_7HG0JllHQpsGuvQr4OhKudcCsFVRM__oQu_bl38LlLAqGKCiUjBjipaH0KLlVm19dK2X4aC2Wo1UavZajW_WmPlbFepEfEPl0wKxSH_AQWXcjg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1804184646</pqid></control><display><type>article</type><title>Wavelet Speech Enhancement Based on Nonnegative Matrix Factorization</title><source>IEEE Electronic Library (IEL)</source><creator>Syu-Siang Wang ; Chern, Alan ; Yu Tsao ; Jeih-weih Hung ; Xugang Lu ; Ying-Hui Lai ; Borching Su</creator><creatorcontrib>Syu-Siang Wang ; Chern, Alan ; Yu Tsao ; Jeih-weih Hung ; Xugang Lu ; Ying-Hui Lai ; Borching Su</creatorcontrib><description>For the state-of-the-art speech enhancement (SE) techniques, a spectrogram is usually preferred than the respective time-domain raw data, since it reveals more compact presentation together with conspicuous temporal information over a long time span. However, two problems can cause distortions in the conventional nonnegative matrix factorization (NMF)-based SE algorithms. One is related to the overlap-and-add operation used in the short-time Fourier transform (STFT)-based signal reconstruction, and the other is concerned with directly using the phase of the noisy speech as that of the enhanced speech in signal reconstruction. These two problems can cause information loss or discontinuity when comparing the clean signal with the reconstructed signal. To solve these two problems, we propose a novel SE method that adopts discrete wavelet packet transform (DWPT) and NMF. In brief, the DWPT is first applied to split a time-domain speech signal into a series of subband signals. Then, we exploit NMF to highlight the speech component for each subband. These enhanced subband signals are joined together via the inverse DWPT to reconstruct a noise-reduced signal in time domain. We evaluate the proposed DWPT-NMF-based SE method on the Mandarin hearing in noise test (MHINT) task. Experimental results show that this new method effectively enhances speech quality and intelligibility and outperforms the conventional STFT-NMF-based SE system.</description><identifier>ISSN: 1070-9908</identifier><identifier>EISSN: 1558-2361</identifier><identifier>DOI: 10.1109/LSP.2016.2571727</identifier><identifier>CODEN: ISPLEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Discrete wavelet packet transform (DWPT) ; Distortion ; Estimation ; Factorization ; Fourier transforms ; Mandarins ; Noise measurement ; nonnegative matrix factorization (NMF) ; short-time Fourier transform (STFT) ; Signal reconstruction ; Spectrogram ; Speech ; Speech enhancement ; speech enhancement (SE) ; Speech processing ; Time domain ; Time-domain analysis ; Wavelet</subject><ispartof>IEEE signal processing letters, 2016-08, Vol.23 (8), p.1101-1105</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2016</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c324t-955e67d7010db8779c32dfd5a11c7e842009179ee93d56b56604e6efa4f1fac93</citedby><cites>FETCH-LOGICAL-c324t-955e67d7010db8779c32dfd5a11c7e842009179ee93d56b56604e6efa4f1fac93</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7476850$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7476850$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Syu-Siang Wang</creatorcontrib><creatorcontrib>Chern, Alan</creatorcontrib><creatorcontrib>Yu Tsao</creatorcontrib><creatorcontrib>Jeih-weih Hung</creatorcontrib><creatorcontrib>Xugang Lu</creatorcontrib><creatorcontrib>Ying-Hui Lai</creatorcontrib><creatorcontrib>Borching Su</creatorcontrib><title>Wavelet Speech Enhancement Based on Nonnegative Matrix Factorization</title><title>IEEE signal processing letters</title><addtitle>LSP</addtitle><description>For the state-of-the-art speech enhancement (SE) techniques, a spectrogram is usually preferred than the respective time-domain raw data, since it reveals more compact presentation together with conspicuous temporal information over a long time span. However, two problems can cause distortions in the conventional nonnegative matrix factorization (NMF)-based SE algorithms. One is related to the overlap-and-add operation used in the short-time Fourier transform (STFT)-based signal reconstruction, and the other is concerned with directly using the phase of the noisy speech as that of the enhanced speech in signal reconstruction. These two problems can cause information loss or discontinuity when comparing the clean signal with the reconstructed signal. To solve these two problems, we propose a novel SE method that adopts discrete wavelet packet transform (DWPT) and NMF. In brief, the DWPT is first applied to split a time-domain speech signal into a series of subband signals. Then, we exploit NMF to highlight the speech component for each subband. These enhanced subband signals are joined together via the inverse DWPT to reconstruct a noise-reduced signal in time domain. We evaluate the proposed DWPT-NMF-based SE method on the Mandarin hearing in noise test (MHINT) task. Experimental results show that this new method effectively enhances speech quality and intelligibility and outperforms the conventional STFT-NMF-based SE system.</description><subject>Algorithms</subject><subject>Discrete wavelet packet transform (DWPT)</subject><subject>Distortion</subject><subject>Estimation</subject><subject>Factorization</subject><subject>Fourier transforms</subject><subject>Mandarins</subject><subject>Noise measurement</subject><subject>nonnegative matrix factorization (NMF)</subject><subject>short-time Fourier transform (STFT)</subject><subject>Signal reconstruction</subject><subject>Spectrogram</subject><subject>Speech</subject><subject>Speech enhancement</subject><subject>speech enhancement (SE)</subject><subject>Speech processing</subject><subject>Time domain</subject><subject>Time-domain analysis</subject><subject>Wavelet</subject><issn>1070-9908</issn><issn>1558-2361</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkE1Lw0AQhhdRsFbvgpeAFy-ps8l-HrW2KtQPqOJx2SYTm5Lu1mxa1F_vlooHTzO8PO8wPIScUhhQCvpyMn0eZEDFIOOSykzukR7lXKVZLuh-3EFCqjWoQ3IUwgIAFFW8R27e7AYb7JLpCrGYJyM3t67AJbouubYBy8S75NE7h--2qzeYPNiurT-TsS0639bfMfTumBxUtgl48jv75HU8ehnepZOn2_vh1SQt8ox1qeYchSwlUChnSkod47IquaW0kKhYBqCp1Ig6L7mYcSGAocDKsopWttB5n1zs7q5a_7HG0JllHQpsGuvQr4OhKudcCsFVRM__oQu_bl38LlLAqGKCiUjBjipaH0KLlVm19dK2X4aC2Wo1UavZajW_WmPlbFepEfEPl0wKxSH_AQWXcjg</recordid><startdate>201608</startdate><enddate>201608</enddate><creator>Syu-Siang Wang</creator><creator>Chern, Alan</creator><creator>Yu Tsao</creator><creator>Jeih-weih Hung</creator><creator>Xugang Lu</creator><creator>Ying-Hui Lai</creator><creator>Borching Su</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope><scope>FR3</scope></search><sort><creationdate>201608</creationdate><title>Wavelet Speech Enhancement Based on Nonnegative Matrix Factorization</title><author>Syu-Siang Wang ; Chern, Alan ; Yu Tsao ; Jeih-weih Hung ; Xugang Lu ; Ying-Hui Lai ; Borching Su</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c324t-955e67d7010db8779c32dfd5a11c7e842009179ee93d56b56604e6efa4f1fac93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Algorithms</topic><topic>Discrete wavelet packet transform (DWPT)</topic><topic>Distortion</topic><topic>Estimation</topic><topic>Factorization</topic><topic>Fourier transforms</topic><topic>Mandarins</topic><topic>Noise measurement</topic><topic>nonnegative matrix factorization (NMF)</topic><topic>short-time Fourier transform (STFT)</topic><topic>Signal reconstruction</topic><topic>Spectrogram</topic><topic>Speech</topic><topic>Speech enhancement</topic><topic>speech enhancement (SE)</topic><topic>Speech processing</topic><topic>Time domain</topic><topic>Time-domain analysis</topic><topic>Wavelet</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Syu-Siang Wang</creatorcontrib><creatorcontrib>Chern, Alan</creatorcontrib><creatorcontrib>Yu Tsao</creatorcontrib><creatorcontrib>Jeih-weih Hung</creatorcontrib><creatorcontrib>Xugang Lu</creatorcontrib><creatorcontrib>Ying-Hui Lai</creatorcontrib><creatorcontrib>Borching Su</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><jtitle>IEEE signal processing letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Syu-Siang Wang</au><au>Chern, Alan</au><au>Yu Tsao</au><au>Jeih-weih Hung</au><au>Xugang Lu</au><au>Ying-Hui Lai</au><au>Borching Su</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Wavelet Speech Enhancement Based on Nonnegative Matrix Factorization</atitle><jtitle>IEEE signal processing letters</jtitle><stitle>LSP</stitle><date>2016-08</date><risdate>2016</risdate><volume>23</volume><issue>8</issue><spage>1101</spage><epage>1105</epage><pages>1101-1105</pages><issn>1070-9908</issn><eissn>1558-2361</eissn><coden>ISPLEM</coden><abstract>For the state-of-the-art speech enhancement (SE) techniques, a spectrogram is usually preferred than the respective time-domain raw data, since it reveals more compact presentation together with conspicuous temporal information over a long time span. However, two problems can cause distortions in the conventional nonnegative matrix factorization (NMF)-based SE algorithms. One is related to the overlap-and-add operation used in the short-time Fourier transform (STFT)-based signal reconstruction, and the other is concerned with directly using the phase of the noisy speech as that of the enhanced speech in signal reconstruction. These two problems can cause information loss or discontinuity when comparing the clean signal with the reconstructed signal. To solve these two problems, we propose a novel SE method that adopts discrete wavelet packet transform (DWPT) and NMF. In brief, the DWPT is first applied to split a time-domain speech signal into a series of subband signals. Then, we exploit NMF to highlight the speech component for each subband. These enhanced subband signals are joined together via the inverse DWPT to reconstruct a noise-reduced signal in time domain. We evaluate the proposed DWPT-NMF-based SE method on the Mandarin hearing in noise test (MHINT) task. Experimental results show that this new method effectively enhances speech quality and intelligibility and outperforms the conventional STFT-NMF-based SE system.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/LSP.2016.2571727</doi><tpages>5</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1070-9908
ispartof IEEE signal processing letters, 2016-08, Vol.23 (8), p.1101-1105
issn 1070-9908
1558-2361
language eng
recordid cdi_crossref_primary_10_1109_LSP_2016_2571727
source IEEE Electronic Library (IEL)
subjects Algorithms
Discrete wavelet packet transform (DWPT)
Distortion
Estimation
Factorization
Fourier transforms
Mandarins
Noise measurement
nonnegative matrix factorization (NMF)
short-time Fourier transform (STFT)
Signal reconstruction
Spectrogram
Speech
Speech enhancement
speech enhancement (SE)
Speech processing
Time domain
Time-domain analysis
Wavelet
title Wavelet Speech Enhancement Based on Nonnegative Matrix Factorization
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T18%3A57%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Wavelet%20Speech%20Enhancement%20Based%20on%20Nonnegative%20Matrix%20Factorization&rft.jtitle=IEEE%20signal%20processing%20letters&rft.au=Syu-Siang%20Wang&rft.date=2016-08&rft.volume=23&rft.issue=8&rft.spage=1101&rft.epage=1105&rft.pages=1101-1105&rft.issn=1070-9908&rft.eissn=1558-2361&rft.coden=ISPLEM&rft_id=info:doi/10.1109/LSP.2016.2571727&rft_dat=%3Cproquest_RIE%3E4118096571%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1804184646&rft_id=info:pmid/&rft_ieee_id=7476850&rfr_iscdi=true