Multichannel Blind Sound Source Separation Using Spatial Covariance Model With Level and Time Differences and Nonnegative Matrix Factorization

This paper presents an algorithm for multichannel sound source separation using explicit modeling of level and time differences in source spatial covariance matrices (SCM). We propose a novel SCM model in which the spatial properties are modeled by the weighted sum of direction of arrival (DOA) kern...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2018-09, Vol.26 (9), p.1512-1527
Hauptverfasser:	Carabias-Orti, Julio Jose, Nikunen, Joonas, Virtanen, Tuomas, Vera-Candeas, Pedro
Format:	Artikel
Sprache:	eng
Schlagworte:	Covariance matrices Direction-of-arrival estimation interaural level difference interaural time difference Kernel Microphones Multichannel source separation non-negative matrix factorization Source separation spatial covariance model Spectrogram Time-frequency analysis
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper presents an algorithm for multichannel sound source separation using explicit modeling of level and time differences in source spatial covariance matrices (SCM). We propose a novel SCM model in which the spatial properties are modeled by the weighted sum of direction of arrival (DOA) kernels. DOA kernels are obtained as the combination of phase and level difference covariance matrices representing both time and level differences between microphones for a grid of predefined source directions. The proposed SCM model is combined with the NMF model for the magnitude spectrograms. Opposite to other SCM models in the literature, in this work, source localization is implicitly defined in the model and estimated during the signal factorization. Therefore, no localization preprocessing is required. Parameters are estimated using complex-valued nonnegative matrix factorization with both Euclidean distance and Itakura-Saito divergence. Separation performance of the proposed system is evaluated using the two-channel SiSEC development dataset and four channels signals recorded in a regular room with moderate reverberation. Finally, a comparison to other state-of-the-art methods is performed, showing better achieved separation performance in terms of SIR and perceptual measures.
ISSN:	2329-9290 2329-9304
DOI:	10.1109/TASLP.2018.2830105