Birdsong classification based on multi-feature fusion

The classification of birdsong has very important signification to monitor the bird population in the habitats. Aiming at the birdsong dataset with complex and diverse audio background, this paper attempts to introduce an acoustic feature for voice and music analysis: Chroma. It is spliced and fused...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Multimedia tools and applications 2021-12, Vol.80 (30), p.36529-36547
Hauptverfasser: Yan, Na, Chen, Aibin, Zhou, Guoxiong, Zhang, Zhiqiang, Liu, Xiangyong, Wang, Jianwu, Liu, Zhihua, Chen, Wenjie
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The classification of birdsong has very important signification to monitor the bird population in the habitats. Aiming at the birdsong dataset with complex and diverse audio background, this paper attempts to introduce an acoustic feature for voice and music analysis: Chroma. It is spliced and fused with the commonly used birdsong features, Log-Mel Spectrogram (LM) and Mel Frequency Cepstrum Coefficient (MFCC), to enrich the representational capacity of single feature; At the same time, in view of the characteristic that birdsong has continuous and dynamic changes in time, a 3DCNN-LSTM combined model is proposed as a classifier to make the network more sensitive to the birdsong information that changes with time. In this paper, we selected four bird audio data from the Xeno-Canto website to evaluate how LM, MFCC and Chroma were fused to maximize the birdsong audio information. The experimental results show that the LM-MFCC-C feature combination achieves the best result of 97.9% mean average precision (mAP) in the experiment.
ISSN:1380-7501
1573-7721
DOI:10.1007/s11042-021-11396-9