Minimum Processing Beamforming

Most of the well-known classic beamformers have resulted from optimization problems that minimize a cost function such as the mean-square error (MSE) between the noisy speech and a reference clean speech. The rationale behind these formulations involves a speech-versus-noise dichotomy, where anythin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2021, Vol.29, p.2710-2724
Hauptverfasser: Zahedi, Adel, Pedersen, Michael Syskind, Ostergaard, Jan, Christiansen, Thomas Ulrich, Bramslow, Lars, Jensen, Jesper
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Most of the well-known classic beamformers have resulted from optimization problems that minimize a cost function such as the mean-square error (MSE) between the noisy speech and a reference clean speech. The rationale behind these formulations involves a speech-versus-noise dichotomy, where anything branded as noise shall be suppressed as much as possible. While leading to simple closed-form solutions and reasonably practical beamformers, this rationale has its own limitations, for instance, when the ambient noise provides context and is therefore not entirely undesirable. In this article, we offer a new rationale, where the output of the beamformer is minimally processed with respect to a certain reference signal, as long as a given performance criterion is fulfilled. We provide a case study where the performance criterion is inspired by the Speech Intelligibility Index (SII), and the processing penalty is MSE. Regarding the reference signal, we consider two cases. In the first case, the reference signal is set to the unprocessed recording from a reference microphone, giving rise to a beamformer that limits the processing of the noisy signal to a minimum necessary for fulfilling the intelligibility requirement. For the second case, the reference signal is the output of an aggressive beamformer, yielding a beamformer that essentially eliminates the noise unless the concomitant distortion of the clean speech violates the intelligibility requirement. Through simulation studies, we demonstrate some of the benefits that each of the two cases offer in relevant contexts.
ISSN:2329-9290
2329-9304
DOI:10.1109/TASLP.2021.3053411