Disentangled Speaker and Nuisance Attribute Embedding for Robust Speaker Verification

Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based methods are known to suffer from severe performance degradation wh...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2020, Vol.8, p.141838-141849
Hauptverfasser:	Kang, Woo Hyun, Mun, Sung Hwan, Han, Min Hyun, Kim, Nam Soo
Format:	Artikel
Sprache:	eng
Schlagworte:	Deep learning domain disentanglement Embedding Law enforcement Machine learning Nuisance Performance degradation Performance evaluation Robustness speaker verification Speech embedding Task analysis Training Verification
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based methods are known to suffer from severe performance degradation when dealing with speech samples with different conditions (e.g., recording devices, emotional states). In this paper, we propose a novel fully supervised training method for extracting a speaker embedding vector disentangled from the variability caused by the nuisance attributes. The proposed framework was compared with the conventional deep learning-based embedding methods using the RSR2015 and VoxCeleb1 dataset. Experimental results show that the proposed approach can extract speaker embeddings robust to channel and emotional variability.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2020.3012893