DCU-Net transient noise suppression based on joint spectrum estimation

Transient noise has a high short-time energy, a high degree of randomness, a wide frequency-domain distribution, and only causes local signal pollution. Traditional denoising methods usually establish the assumption of a certain kind of relationship between speech and noise, and this assumption does...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Signal, image and video processing image and video processing, 2023-10, Vol.17 (7), p.3265-3273
Hauptverfasser: Lan, Chaofeng, Zhao, Shilong, Zhang, Lei, Chen, Huan, Guo, Rui, Si, Zhenfei, Guo, Xiaoxia, Han, Chuang, Zhang, Meng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Transient noise has a high short-time energy, a high degree of randomness, a wide frequency-domain distribution, and only causes local signal pollution. Traditional denoising methods usually establish the assumption of a certain kind of relationship between speech and noise, and this assumption does not necessarily match real-life scenarios. Therefore, using traditional denoising methods does not effectively suppress transient noise. For the above reasons, this paper proposes a new denoising scheme. First, based on the conventional optimally-modified log-spectral amplitude (OM-LSA) estimation algorithm, the minima controlled recursive averaging algorithm is replaced by the improved mean recurrence time algorithm, and the transient noise spectrum is estimated. Second, transient noise segments are determined using thresholds and fed into a deep complex-valued U-Net (DCU-Net) network for speech enhancement. Third, insert the enhanced results into the original sequence to reconstruct the denoised speech signal. Finally, this paper uses the Voice Bank corpus speech and homemade noise datasets to perform experimental tests. The test results show that the segmented signal-to-noise ratio, speech quality perception, and short-term target intelligibility of the proposed method in 0 dB, − 5 dB, and − 10 dB environments have improved than the traditional OM-LSA algorithm. When the signal-to-noise ratio is − 10 dB, the segmented signal-to-noise ratio is improved by 9.8%. The test results show that this paper's proposed method can solidly suppress transient noise at low signal-to-noise ratios and simultaneously improve speech quality.
ISSN:1863-1703
1863-1711
DOI:10.1007/s11760-023-02541-y