Long-term scalogram integrated with an iterative data augmentation scheme for acoustic scene classification

In acoustic scene classification (ASC), acoustic features play a crucial role in the extraction of scene information, which can be stored over different time scales. Moreover, the limited size of the dataset may lead to a biased model with a poor performance for recordings from unseen cities and con...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of the Acoustical Society of America 2021-06, Vol.149 (6), p.4198-4213
Hauptverfasser: Chen, Hangting, Liu, Zuozhen, Liu, Zongming, Zhang, Pengyuan
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In acoustic scene classification (ASC), acoustic features play a crucial role in the extraction of scene information, which can be stored over different time scales. Moreover, the limited size of the dataset may lead to a biased model with a poor performance for recordings from unseen cities and confusing scene classes. This paper proposes a long-term wavelet feature that captures discriminative long-term scene information. The extracted scalogram requires a lower storage capacity and can be classified faster and more accurately compared with classic Mel filter bank coefficients (FBank). Furthermore, a data augmentation scheme is adopted to improve the generalization of the ASC systems, which extends the database iteratively with auxiliary classifier generative adversarial neural networks (ACGANs) and a deep learning-based sample filter. Experiments were conducted on datasets from the Detection and Classification of Acoustic Scenes and Events (DCASE) challenges. The DCASE17 and DCASE19 datasets marked a performance boost of the proposed techniques compared with the FBank classifier. Moreover, the ACGAN-based data augmentation scheme achieved an absolute accuracy improvement of 6.10% on recordings from unseen cities, far exceeding classic augmentation methods.
ISSN:0001-4966
1520-8524
DOI:10.1121/10.0005202