DCU-Net transient noise suppression based on joint spectrum estimation
Transient noise has a high short-time energy, a high degree of randomness, a wide frequency-domain distribution, and only causes local signal pollution. Traditional denoising methods usually establish the assumption of a certain kind of relationship between speech and noise, and this assumption does...
Gespeichert in:
Veröffentlicht in: | Signal, image and video processing image and video processing, 2023-10, Vol.17 (7), p.3265-3273 |
---|---|
Hauptverfasser: | , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Transient noise has a high short-time energy, a high degree of randomness, a wide frequency-domain distribution, and only causes local signal pollution. Traditional denoising methods usually establish the assumption of a certain kind of relationship between speech and noise, and this assumption does not necessarily match real-life scenarios. Therefore, using traditional denoising methods does not effectively suppress transient noise. For the above reasons, this paper proposes a new denoising scheme. First, based on the conventional optimally-modified log-spectral amplitude (OM-LSA) estimation algorithm, the minima controlled recursive averaging algorithm is replaced by the improved mean recurrence time algorithm, and the transient noise spectrum is estimated. Second, transient noise segments are determined using thresholds and fed into a deep complex-valued U-Net (DCU-Net) network for speech enhancement. Third, insert the enhanced results into the original sequence to reconstruct the denoised speech signal. Finally, this paper uses the Voice Bank corpus speech and homemade noise datasets to perform experimental tests. The test results show that the segmented signal-to-noise ratio, speech quality perception, and short-term target intelligibility of the proposed method in 0 dB, − 5 dB, and − 10 dB environments have improved than the traditional OM-LSA algorithm. When the signal-to-noise ratio is − 10 dB, the segmented signal-to-noise ratio is improved by 9.8%. The test results show that this paper's proposed method can solidly suppress transient noise at low signal-to-noise ratios and simultaneously improve speech quality. |
---|---|
ISSN: | 1863-1703 1863-1711 |
DOI: | 10.1007/s11760-023-02541-y |