RQNet: Residual Quaternion CNN for Performance Enhancement in Low Complexity and Device Robust Acoustic Scene Classification

Acoustic Scene Classification aims to recognize the unique acoustic characteristics of an environment. Recently, Convolutional Neural Networks (CNNs) have boosted the accuracy of ASC algorithms. However, the focus of ASC system designers has shifted from improving accuracy to incorporating real-worl...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on multimedia 2023-01, Vol.25, p.1-13
Hauptverfasser: Madhu, Aswathy, K, Suresh
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Acoustic Scene Classification aims to recognize the unique acoustic characteristics of an environment. Recently, Convolutional Neural Networks (CNNs) have boosted the accuracy of ASC algorithms. However, the focus of ASC system designers has shifted from improving accuracy to incorporating real-world considerations like device robustness and model complexity. In this paper, we address the problem of developing a low complexity system for ASC which can generalize across multiple recording devices. We propose to employ residual quaternion CNNs for low complexity, device-robust ASC. The proposed model RQNet uses quaternion encoding to increase the accuracy with fewer parameters. To further enhance the performance of RQNet, we employ a variant of log-mel spectrogram called multi-scale mel spectrogram (ms2) to represent the acoustic signal. Experiments on two benchmark ASC datasets indicate that RQNet outperforms a log-mel spectrum-based baseline by more than twofold. In addition, it has a good measure of separability between the individual classes, as indicated by an AUC (Area Under the ROC Curve) scores of 0.906 and 0.994. Furthermore, it reduces the model size by 82.19% and floating-point operations by 23.25%. Consequently, RQNet is suitable for deployment in context-aware devices.
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2023.3241553