Audio-Visual Group Recognition Using Diffusion Maps

Data fusion is a natural and common approach to recovering the state of physical systems. But the dissimilar appearance of different sensors remains a fundamental obstacle. We propose a unified embedding scheme for multisensory data, based on the spectral diffusion framework, which addresses this is...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on signal processing 2010-01, Vol.58 (1), p.403-413
Hauptverfasser:	Keller, Y., Coifman, R.R., Lafon, S., Zucker, S.W.
Format:	Artikel
Sprache:	eng
Schlagworte:	Applied sciences Biosensors Computed tomography Dimensionality reduction Exact sciences and technology Information theory Information, signal and communications theory Laplacian eigenmaps Magnetic resonance imaging Magnetic sensors Miscellaneous multisensor Optical sensors Sampling, quantization Sensor fusion Signal and communications theory Signal processing Spatial resolution Speech processing Speech recognition Streaming media Telecommunications and information theory X-ray imaging
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Data fusion is a natural and common approach to recovering the state of physical systems. But the dissimilar appearance of different sensors remains a fundamental obstacle. We propose a unified embedding scheme for multisensory data, based on the spectral diffusion framework, which addresses this issue. Our scheme is purely data-driven and assumes no a priori statistical or deterministic models of the data sources. To extract the underlying structure, we first embed separately each input channel; the resultant structures are then combined in diffusion coordinates. In particular, as different sensors sample similar phenomena with different sampling densities, we apply the density invariant Laplace-Beltrami embedding. This is a fundamental issue in multisensor acquisition and processing, overlooked in prior approaches. We extend previous work on group recognition and suggest a novel approach to the selection of diffusion coordinates. To verify our approach, we demonstrate performance improvements in audio/visual speech recognition.
ISSN:	1053-587X 1941-0476
DOI:	10.1109/TSP.2009.2030861