IRM estimation based on data field of cochleagram for speech enhancement

When computational auditory scene analysis (CASA) is used for the speech enhancement, it can mask noise effectively by an accurate mask estimation approach. In this paper, we attempt to apply the ideal ratio mask (IRM) estimation based on the spectral dependency into the speech cochleagram for enhan...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Speech communication 2018-03, Vol.97, p.19-31
Hauptverfasser: Wang, Xianyun, Bao, Feng, Bao, Changchun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:When computational auditory scene analysis (CASA) is used for the speech enhancement, it can mask noise effectively by an accurate mask estimation approach. In this paper, we attempt to apply the ideal ratio mask (IRM) estimation based on the spectral dependency into the speech cochleagram for enhancing speech. To achieve the spectral dependency, the concept of data field (DF) is introduced to model the time-frequency (T-F) relationship of the cochleagram so that the obtained results (termed as the potentials) with the adjacent spectral information are used eventually to estimate the IRM. In the estimation framework, we firstly use a pre-processed module to obtain initial T-F values of noise and speech. Then, given initial estimations of noise and speech, we can employ DF model to obtain the forms of speech and noise potentials, which are viewed as the energy with the information of its neighbors. Subsequently, based on the forms of speech and noise potentials, their optimal potentials that reflect their respective optimal distribution are obtained by the optimal influence factors. Finally, we attempt to obtain the masking value using the potentials of speech and noise for restoring clean target speech signal. Our algorithm is evaluated and compared with the reference methods, and it can yield an effective improvement in speech quality.
ISSN:0167-6393
1872-7182
DOI:10.1016/j.specom.2017.12.014