Dual Microphone Voice Activity Detection Based on Reliable Spatial Cues

Two main spatial cues that can be exploited for dual microphone voice activity detection (VAD) are the interchannel time difference (ITD) and the interchannel level difference (ILD). While both ITD and ILD provide information on the location of audio sources, they may be impaired in different manner...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Sensors (Basel, Switzerland) Switzerland), 2019-07, Vol.19 (14), p.3056
Hauptverfasser: Hwang, Soojoong, Jin, Yu Gwang, Shin, Jong Won
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Two main spatial cues that can be exploited for dual microphone voice activity detection (VAD) are the interchannel time difference (ITD) and the interchannel level difference (ILD). While both ITD and ILD provide information on the location of audio sources, they may be impaired in different manners by background noises and reverberation and therefore can have complementary information. Conventional approaches utilize the statistics from all frequencies with fixed weight, although the information from some time-frequency bins may degrade the performance of VAD. In this letter, we propose a dual microphone VAD scheme based on the spatial cues in reliable frequency bins only, considering the sparsity of the speech signal in the time-frequency domain. The reliability of each time-frequency bin is determined by three conditions on signal energy, ILD, and ITD. ITD-based and ILD-based VADs and statistics are evaluated using the information from selected frequency bins and then combined to produce the final VAD results. Experimental results show that the proposed frequency selective approach enhances the performances of VAD in realistic environments.
ISSN:1424-8220
1424-8220
DOI:10.3390/s19143056