A Time-Frequency Masking Based Random Finite Set Particle Filtering Method for Multiple Acoustic Source Detection and Tracking

Considering that multiple talkers may appear simultaneously, a time-frequency (TF) masking based random finite set (RFS) particle filtering (PF) method is developed for multiple acoustic source detection and tracking. The time-delay of arrival (TDOA) measurements of multiple sources are extracted by...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2015-12, Vol.23 (12), p.2356-2370
Hauptverfasser:	Xionghu Zhong, Hopgood, James R.
Format:	Artikel
Sprache:	eng
Schlagworte:	Acoustic source tracking Atmospheric measurements Hidden Markov models Microphones particle filtering (PF) Particle measurements random finite set (RFS) Reverberation room reverberation Speech processing Time of arrival estimation time-delay of arrival
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Considering that multiple talkers may appear simultaneously, a time-frequency (TF) masking based random finite set (RFS) particle filtering (PF) method is developed for multiple acoustic source detection and tracking. The time-delay of arrival (TDOA) measurements of multiple sources are extracted by using a time-frequency masking technique, by which each source's TF bins are clustered and separated in a joint gain-ratio and time-delay histogram. Since a joint detection and tracking problem is considered, both source positions and source numbers are time-varying and need to be estimated. The tracker is built within a RFS Bayesian filtering framework. Essentially, an RFS process is used to characterize the source dynamics that include source appearance/dissappearance and motion trajectories. Latent variables are also introduced to indicate source dynamics and measurement-source associations. Subsequently, a Rao-Blackwellization PF technique is employed so that the source position state can be marginalized and only the latent variables are estimated by using the PF. The main advantage of the proposed approach is that hypothesis-pruning is formulated in a full probabilistic sense. The performance of the proposed approach is demonstrated in real speech recordings as well as in simulated room environments.
ISSN:	2329-9290 2329-9304
DOI:	10.1109/TASLP.2015.2479041