Weighted X-Vectors for Robust Text-Independent Speaker Verification with Multiple Enrollment Utterances

Speech is a user-friendly signal for identity recognition with low computational complexity and implementation cost. However, the use of speech samples to identify persons involves several limitations, such as degraded performance in real environments due to the presence of different noises and chan...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Circuits, systems, and signal processing systems, and signal processing, 2022-05, Vol.41 (5), p.2825-2844
Hauptverfasser:	Mohammadi, Mohsen, Sadegh Mohammadi, Hamid Reza
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Circuits and Systems Datasets Discriminant analysis Electrical Engineering Electronics and Microelectronics Engineering Instrumentation Performance degradation Signal,Image and Speech Processing Speech recognition Statistical analysis Statistical methods Verification
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Speech is a user-friendly signal for identity recognition with low computational complexity and implementation cost. However, the use of speech samples to identify persons involves several limitations, such as degraded performance in real environments due to the presence of different noises and channel effects. In recent years, deep neural network (DNN)-based approaches have provided good results in speaker verification and outperformed the i-vector based methods. The x-vector is a DNN-based speaker embedding that, in combination with probabilistic linear discriminant analysis (PLDA), increases both the accuracy and robustness of speaker verification systems. In this paper, we propose weighted x-vectors as a method for enhancing the speaker verification system in both clean and noisy environments. It exploits the statistical properties of target speaker enrollment x-vectors for weighting the test x-vector to enhance the scoring accuracy and thus the whole verification system. Experiments were conducted using the VoxCeleb dataset, MFCC feature vectors, and PLDA scoring method. The VoxCeleb is a large-scale dataset that contains real-world short-duration speech samples from over 6,000 speakers. Multicondition training for LDA and PLDA was also employed to improve the system’s performance under mismatched noisy circumstances. The findings showed that using weighted x-vectors led to 18% and 10% reductions in equal error rate (EER) term for clean and noisy conditions, respectively. Also, the experiments show that the increase of the number of enrollment x-vectors results in superior performance of the proposed method.
ISSN:	0278-081X 1531-5878
DOI:	10.1007/s00034-021-01915-2