Multi-granularity acoustic information fusion for sound event detection
Most previous works on sound event detection (SED) are based on binary hard labels of sound events, leaving other scales of information underexplored. To address this problem, we introduce multiple granularities of knowledge into the system to perform hierarchical acoustic information fusion for SED...
Gespeichert in:
Veröffentlicht in: | Signal processing 2025-02, Vol.227, p.109691, Article 109691 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Most previous works on sound event detection (SED) are based on binary hard labels of sound events, leaving other scales of information underexplored. To address this problem, we introduce multiple granularities of knowledge into the system to perform hierarchical acoustic information fusion for SED. Specifically, we present an interactive dual-conformer (IDC) module to adaptively fuse the medium-grained and fine-grained acoustic information based on the hard and soft labels of sound events. In addition, we propose a scene-dependent mask estimator (SDME) module to extract the coarse-grained information from acoustic scenes, introducing the scene-event relationships into the SED system. Experimental results show that the proposed IDC and SDME modules efficiently fuse the acoustic information at different scales and therefore further improve the SED performance. The proposed system achieved Top 1 performance in DCASE 2023 Challenge Task 4B.
•A system to fuse acoustic information with different granularities to improve the performance of SED.•An interactive dual-conformer module to extract information from soft and hard labels of sound events.•A scene-dependent mask estimator to introduce scene-event relationships into the SED system. |
---|---|
ISSN: | 0165-1684 |
DOI: | 10.1016/j.sigpro.2024.109691 |