Binaural Detection, Localization, and Segregation in Reverberant Environments Based on Joint Pitch and Azimuth Cues

We propose an approach to binaural detection, localization and segregation of speech based on pitch and azimuth cues. We formulate the problem as a search through a multisource state space across time, where each multisource state encodes the number of active sources, and the azimuth and pitch of ea...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on audio, speech, and language processing speech, and language processing, 2013-04, Vol.21 (4), p.806-815
Hauptverfasser: Woodruff, J., DeLiang Wang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We propose an approach to binaural detection, localization and segregation of speech based on pitch and azimuth cues. We formulate the problem as a search through a multisource state space across time, where each multisource state encodes the number of active sources, and the azimuth and pitch of each active source. A set of multilayer perceptrons are trained to assign time-frequency units to one of the active sources in each multisource state based jointly on observed pitch and azimuth cues. We develop a novel hidden Markov model framework to estimate the most probable path through the multisource state space. An estimated state path encodes a solution to the detection, localization, pitch estimation and simultaneous organization problems. Segregation is then achieved with an azimuth-based sequential organization stage. We demonstrate that the proposed framework improves segregation relative to several two-microphone comparison systems that are based solely on azimuth cues. Performance gains are consistent across a variety of reverberant conditions.
ISSN:1558-7916
1558-7924
DOI:10.1109/TASL.2012.2236316