Sliding-window based scale-frequency map for bird sound classification using 2D- and 3D-CNN

Bird’s call often contains distinctive information for discriminating different species. Previous studies have investigated various features for bird sound classification. This paper proposes a novel feature set for automatically classifying bird sounds. Specifically, a sliding window is first appli...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2022-11, Vol.207, p.118054, Article 118054
Hauptverfasser: Xie, Jie, Zhu, Mingying
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Bird’s call often contains distinctive information for discriminating different species. Previous studies have investigated various features for bird sound classification. This paper proposes a novel feature set for automatically classifying bird sounds. Specifically, a sliding window is first applied to the audio waveform for obtaining windowed signals, where five windows having the highest energy are selected. Then, orthogonal matching pursuit is applied to those windowed signal for extracting important Gabor atoms. Next, a multi-window scale-frequency map is constructed as the input to three different CNN models for bird sound classification. Experiments on two bird sound classification datasets demonstrate the effectiveness of our proposed framework for supplementing traditional audio time–frequency representations. Classification accuracy and F1-score on the 14-class bird sound dataset are 98.96% and 98.93%, respectively, which are obtained by 2D-CNN-v2 with ERB-scaled SFM as the input. For 18-class bird sound dataset, the highest accuracy and F1-score are 97.82% and 97.47%, respectively, where 2D-CNN-v2 is used with Bark-scaled SFM as the input. Moreover, 2D-CNN-v2 with Bark-scaled SFM as the input achieves the best performance for the combined data, and the highest accuracy and F1-score are 98.59% and 98.51%, respectively. •Characterize bird sounds using optimized scale-frequency maps.•Classify bird sounds based on scale-frequency maps and CNN models.•Compare 2D and 3D features for bird sound classification.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2022.118054