Cross-corpus open set bird species recognition by vocalization

•Instance Frequency Normalization (IFN) is proposed to eliminate instance-specific frequency differences across different corpora.•The threshold-based Probabilistic Linear Discriminant Analysis (PLDA) is introduced to discover the unknown species.•An x-vector feature extraction model integrated TDNN...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Ecological indicators 2023-10, Vol.154, p.110826, Article 110826
Hauptverfasser: Xie, Jiangjian, Zhang, Luyang, Zhang, Junguo, Zhang, Yanyun, Schuller, Björn W.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Instance Frequency Normalization (IFN) is proposed to eliminate instance-specific frequency differences across different corpora.•The threshold-based Probabilistic Linear Discriminant Analysis (PLDA) is introduced to discover the unknown species.•An x-vector feature extraction model integrated TDNN and LSTM is designed to better capture sequence information.•Focusing on frequency information is more beneficial for cross-corpus open set bird species recognition by vocalization. In the wild, bird vocalizations of the same species across different populations may be different (e.g., so called dialect). Besides, the number of species is unknown in advance. These two facts make the task of bird species recognition based on vocalization a challenging one. This study treats this task as an open set recognition (OSR) cross-corpus scenario. We propose Instance Frequency Normalization (IFN) to remove instance-specific differences across different corpora. Furthermore, an x-vector feature extraction model integrated Time Delay Neural Network (TDNN) and Long Short-Term Memory (LSTM) are designed to better capture sequence information. Finally, the threshold-based Probabilistic Linear Discriminant Analysis (PLDA) is introduced to discriminate the extracted x-vector features to discover the unknown classes. When compared to the best results of the existing method, the average ACCs for the single-corpus and cross-corpus experiments are improved, implying that our method can provide a potential solution and improve performance for cross-corpus bird species recognition based on vocalization in open set condition.
ISSN:1470-160X
DOI:10.1016/j.ecolind.2023.110826