Binaural-Projection Multichannel Wiener Filter for Cue-Preserving Binaural Speech Enhancement
Former research in binaural speech enhancement has demonstrated a demand of binaural cue preservation beyond the requirements of noise suppression and speech quality. The binaural state-of-the-art is frequently grouped into the class of spatio-temporal optimum filters with composite cost functions d...
Gespeichert in:
Veröffentlicht in: | IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2023, Vol.31, p.3730-3745 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Former research in binaural speech enhancement has demonstrated a demand of binaural cue preservation beyond the requirements of noise suppression and speech quality. The binaural state-of-the-art is frequently grouped into the class of spatio-temporal optimum filters with composite cost functions dedicated to a compromise of simultaneous requirements and the class of common-gain spectral filters with exact cue preservation by construction. In this article, we pursue spatio-temporal filtering by convex MMSE estimation constrained to strict binaural cue preservation. To this end, we rely on a frequency-domain representation of well-known interaural-level (ILD) and interaural-time differences (ITD) for setting up a complex-valued constraint. It is then demonstrated that the sought spatial filter effectively falls into the class of common-gain spectral filtering, where the gain consists of a new arrangement of two spectral weightings related to acoustic transfer function (ATF) and power-spectral density (PSD), respectively. Moreover, its equivalence to an unconstrained multiple-input/multiple-output multichannel Wiener filter (MIMO-MWF) with binaural projection onto original noisy spatial cues is shown, hence the naming of the proposed solution as a binaural-projection multichannel Wiener filter (BP-MWF). Experimental results in terms of ILD/ITD spectral histograms and distance metrics confirm that BP-MWF meets the desire of spatial cue preservation. Regarding noise suppression and speech quality, BP-MWF turns out to improve instrumental segSNR, PESQ and STOI metrics over binaural state-of-the-art, such as the partial-noise-estimation forms of MVDR and MWF, and is competitive with the unconstrained MIMO-MWF as an upper bound. The results are finally supported by a formal listening test including various SNR, source directions, and noise types. |
---|---|
ISSN: | 2329-9290 2329-9304 |
DOI: | 10.1109/TASLP.2023.3317569 |