D2L2-Dense LSTM Deep Learning Based Nonlinear Acoustic Echo Cancellation
Speech quality is a crucial concern, as voice communication is a more noteworthy and ubiquitous aspect of everyday life. The emergence of audible echoes is one of the factors contributing to uncomplimentary quality deterioration. Network hardware and end-user devices are intrinsically prone to this...
Gespeichert in:
Veröffentlicht in: | Traitement du signal 2024-08, Vol.41 (4), p.1823-1834 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng ; fre |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Speech quality is a crucial concern, as voice communication is a more noteworthy and ubiquitous aspect of everyday life. The emergence of audible echoes is one of the factors contributing to uncomplimentary quality deterioration. Network hardware and end-user devices are intrinsically prone to this sort of quality deterioration. Designing efficient acoustic echo cancellation (AEC) devices is vital for improving listening comfort and voice quality. When we utilize inexpensive and small analog components, an echo canceller operates poorly or not at all in the system if the net nonlinear distortion is greater than a certain value. Many adaptive filters are used to remove the echo from the microphone signal to solve this problem. Nonetheless, it is difficult to accomplish the preeminent performance of the AEC in real-time circumstances. In this work, we propose nonlinear acoustic echo cancellation (NAEC) using dense long short-term memory (LSTM)-based deep learning (D2L2). Deep learning has been applied to the concept of speech source separation (SSS). In our deep learning based NAEC, the near-end signal is separated from the microphone using LSTM layer training. Before learning commences, the Short-Time Fourier Transform (STFT) is used to extract frequency-time domain features from the acoustic signal. In the learning part of D2L2, two targets are assigned. The spectral Magnitude Mask (MM) is the primary, and the Near-end Signal Mask (NSM) is the secondary mask. The simulation shows that our D2L2 achieves a higher Echo Return Loss Enhancement (ERLE) than other works. |
---|---|
ISSN: | 0765-0019 1958-5608 |
DOI: | 10.18280/ts.410414 |