Localization and Characterization of Multiple Harmonic Sources

We introduce a new and intuitive algorithm to characterize and localize multiple harmonic sources intersecting in the spatial and frequency domains. It jointly estimates their fundamental frequencies, their respective amplitudes, and their directions of arrival based on an intelligent non-parametric...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2016-08, Vol.24 (8), p.1348-1363
Hauptverfasser:	Pessentheiner, Hannes, Hagmuller, Martin, Kubin, Gernot
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms data association Direction of arrival Direction-of-arrival estimation Estimates Frequency estimation fundamental frequency Harmonic analysis Harmonics Indexes Joint estimation microphone array Microphones pitch estimation pitch-period doubling Recall Resonant frequency Root-mean-square errors sparse joint parameter space Speech Speech processing
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We introduce a new and intuitive algorithm to characterize and localize multiple harmonic sources intersecting in the spatial and frequency domains. It jointly estimates their fundamental frequencies, their respective amplitudes, and their directions of arrival based on an intelligent non-parametric signal representation. To obtain these parameters, we first apply variable-scale sampling on unbiased cross-correlation functions between pairs of microphone signals to generate a joint parameter space. Then, we employ a multidimensional maxima detector to represent the parameters in a sparse joint parameter space. In comparison to others, our algorithm solves the issue of pitch-period doubling when using cross-correlation functions, it estimates multiple harmonic sources with a signal power smaller than the signal power of the dominant harmonic source, and it associates the estimated parameters to their corresponding sources in a multidimensional sparse joint parameter space, which can be directly fed into a tracker. We tested our algorithm and three others on synthetic data and speech data recorded in a real reverberant environment and evaluated their performance by employing the joint recall measure, the root-mean-square error, and the cumulative distribution function of fundamental frequencies and directions of arrival. The evaluations show promising results: Our algorithm outperforms the others in terms of the joint recall measure, and it can achieve root-mean-square errors of 1 Hz or 1^\circ and smaller, which facilitates, e.g., distant-speech enhancement or source separation.
ISSN:	2329-9290 2329-9304
DOI:	10.1109/TASLP.2016.2556282