Multi-view representation for sound event recognition

The sound event recognition (SER) task is gaining lot of importance in emerging applications such as machine audition, audio surveillance, and environmental audio scene recognition. The recognition of sound events with noisy conditions in real-time surveillance applications is a difficult task. In t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Signal, image and video processing image and video processing, 2021, Vol.15 (6), p.1211-1219
Hauptverfasser: Chandrakala, S., M, Venkatraman, N, Shreyas, L, Jayalakshmi S
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The sound event recognition (SER) task is gaining lot of importance in emerging applications such as machine audition, audio surveillance, and environmental audio scene recognition. The recognition of sound events with noisy conditions in real-time surveillance applications is a difficult task. In this paper, we focus on learning patterns using multiple forms (views) of the given sound events. We propose two variants of the Multi-View Representation (MVR)-based approach for the SER task. The first variant combines the auditory image-based features and the cepstral features from sound signal. The second variant combines the statistical features extracted from the auditory images and the cepstral features of sound signal. In addition to these variants, Constant Q-transform and Variable Q-transform image-based features are also explored to study the other effective forms of multi-view representations. A discriminative model-based classifier is then used to recognize these representations as environmental sound events. The performance of the proposed MVR approaches is evaluated on three benchmark sound event datasets namely ESC-50, DCASE2016 Task 2, and DCASE2018 Task 2 for the SER task. The recognition accuracy of the proposed MVR approach is significantly better than the other approaches proposed in the recent literature.
ISSN:1863-1703
1863-1711
DOI:10.1007/s11760-020-01851-9