Delay-and-Sum Beamforming-Based Spatial Mapping for Multisource Sound Localization

Multisource sound localization can find applications in many domains, including auditory scene analysis, fault detection, and diagnosis in manufacturing, augmented reality, etc. In far fields, 3-D sound source localization is equivalent to finding the direction of arrival (DOA), namely, the azimuth...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE internet of things journal 2024-05, Vol.11 (9), p.16048-16060
Hauptverfasser:	He, Changjiang, Cheng, Siyao, Zheng, Rong, Liu, Jie
Format:	Artikel
Sprache:	eng
Schlagworte:	Angles (geometry) Array signal processing Artificial neural networks Audio data Audio processing Audio signals Augmented reality Azimuth Beamforming deep learning (DL) delay and sum beamforming Direction of arrival direction of arrival (DOA) Direction-of-arrival estimation Errors Estimation Fault detection Feature extraction Internet of Things Localization Location awareness Machine learning Mapping Microphones Neural networks Redundancy Scene analysis Sound localization Sound sources Spatial data
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Multisource sound localization can find applications in many domains, including auditory scene analysis, fault detection, and diagnosis in manufacturing, augmented reality, etc. In far fields, 3-D sound source localization is equivalent to finding the direction of arrival (DOA), namely, the azimuth and elevation angles of sound sources. Recent DOA estimation pipelines take multichannel audio inputs, extract spectral features from each channel and then feed them into a deep neural network. Unfortunately, the spectral features contain only the time-frequency information of the audio signals, while spatial information is only implicitly captured in the signals across different channels, which is highly dependent on the acoustic array geometry. To embed the spatial information of the sound source into the spectral feature representation, we propose a DSB-based spatial mapping method encode sound source location information. It can be combined with different feature extraction methods and machine learning models for DOA estimation. Furthermore, a redundancy removal procedure is proposed to accelerate DSB computation so that the pipeline can run in real-time on embedded GPUs, such as NVidia Jeston Nano. We conduct extensive experiments using two neural network models along with the DSB method on two data sets. The experiments demonstrate that the DOA errors can be effectively reduced using the DSB method. When combining DSB for feature extraction, the DOA errors are reduced by up to 19.24%. In addition, the feature extraction process is accelerated by up to 30.42% after the application of redundancy removal.
ISSN:	2327-4662 2327-4662
DOI:	10.1109/JIOT.2024.3352051