CHAWA: Overcoming Phase Anomalies in Sound Localization
Sound Source Localization (SSL) is a key task in audio signal processing, focusing on estimating the position of sound sources relative to a reference point, typically a microphone array. In this paper, we introduce a novel dataset named SoS, specifically designed for indoor SSL scenarios, containin...
Gespeichert in:
Veröffentlicht in: | IEEE access 2024, Vol.12, p.148653-148665 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Sound Source Localization (SSL) is a key task in audio signal processing, focusing on estimating the position of sound sources relative to a reference point, typically a microphone array. In this paper, we introduce a novel dataset named SoS, specifically designed for indoor SSL scenarios, containing real-life recordings augmented by background noises. Subsequently, we perform a comparative analysis of various features and models across three categories: traditional signal processing methods, classical machine learning approaches, and deep learning architectures. This analysis offers insights into the performance and limitations of each method under varying conditions. Our findings show that augmentation methods like Time Masking, when paired with the AdamW optimizer and Huber loss, typically result in performance improvements than alternative configurations. Our investigation into robustness to phase information discrepancies led us to the conclusion that magnitude feature might be more useful than traditional features like mel spectra or time-difference-of-arrival (TDoA). Additionally, our study emphasizes the difficulties in utilizing budget microphone receivers in a coplanar quadrilateral arrangement to achieve better sound source localization. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2024.3476492 |