Attention based gender and nationality information exploration for speaker identification

Gender and nationality information has not been exploited in large-scale speaker recognition despite being provided in the popular VoxCeleb1 dataset. This paper explores methods that combine high-level features extracted from the gender and nationality information with low-level acoustic features fo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Digital signal processing 2022-04, Vol.123, p.103449, Article 103449
Hauptverfasser: Tang, Yong, Liu, Chuang, Leng, Yan, Zhao, Weiwei, Sun, Jiande, Sun, Chengli, Wang, Rongyan, Yuan, Qi, Li, Dengwang, Xu, Huaqiang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Gender and nationality information has not been exploited in large-scale speaker recognition despite being provided in the popular VoxCeleb1 dataset. This paper explores methods that combine high-level features extracted from the gender and nationality information with low-level acoustic features for speaker identification. To our knowledge, this is the first time that the gender and nationality information provided in VoxCeleb1 is utilized in speaker identification. Specifically, we propose Gender-Guided Spectrogram-Attention network and Nationality-Guided Spectrogram-Attention network that embed gender and nationality information into the spectrogram features, respectively. The resulting gender and nationality embeddings are then used with the spectrogram features together for classification. Experimental results show that the proposed methods can successfully capture the gender and nationality information of the speakers, and can effectively improve speaker identification accuracy.
ISSN:1051-2004
1095-4333
DOI:10.1016/j.dsp.2022.103449