A lightweight multi-sensory field-based dual-feature fusion residual network for bird song recognition

Bird song recognition plays an important function in ecosystem balance monitoring, biodiversity detection, and biodiversity conservation. Due to the complexity of the natural environment, based on deep learning, there is a problem of information loss in extracting audio features with a single filter...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied soft computing 2023-10, Vol.146, p.110678, Article 110678
Hauptverfasser: Hu, Shipeng, Chu, Yihang, Tang, Lu, Zhou, Guoxiong, Chen, Aibin, Sun, Yurong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Bird song recognition plays an important function in ecosystem balance monitoring, biodiversity detection, and biodiversity conservation. Due to the complexity of the natural environment, based on deep learning, there is a problem of information loss in extracting audio features with a single filter. Identifying bird sounds efficiently and quickly is still a challenge. To address this problem, a lightweight multi-sensory field dual-feature fusion residual network (LDFSRE-NET) is proposed in this paper. Firstly, using the feature extraction filters based on Mel and SincNet to extract birdsong’s low-frequency and timbre information. The proposed dual-feature fusion module (FFMS) is used to fuse the low-frequency and timbre information with the differences between the two feature sets. Secondly, the double-layer residual module (DBNet), connected by basicblock and downblock, is used as the backbone network for bird song recognition to improve the training speed. To improve the different perceptual fields of the backbone network, the 3 × 3 convolutional modules in the basicblocks of the two residual modules are replaced with a Diverse Branch Block. They make the network performs better on recognition tasks under complex situations of multiple branches. Then, the ShuffleAttention attention module is embedded between the two layers of the residual module for transferring its valid information, enhancing the spectrogram ripple feature, and further improving the network’s recognition performance. Finally, extensive experiments are conducted on three datasets: the self-built 30-class bird song dataset (Birdselfdata), the public datasets Birdsdata and Urbansound8K. The model proposed in this paper surpasses the state-of-the-art sound recognition model methods in terms of efficiency and accuracy. The recognition accuracy on these three datasets of this model are 96.75%, 96.46%, and 97.98%, with the F1-score of 96.79%, 96.39%, and 97.88%. •We built 30 kinds of birdsong data sets by ourselves.•We proposed dual-feature fusion module (FFMS).•We propose lightweight backbone networks using residual networks.•We introduce different receptive fields into the backbone network.•We improve the transmission of effective information between residual modules.
ISSN:1568-4946
1872-9681
DOI:10.1016/j.asoc.2023.110678