aDCF Loss Function for Deep Metric Learning in End-to-End Text-Dependent Speaker Verification Systems

Metric learning approaches have widely expanded to the training of Speaker Verification (SV) systems based on Deep Neural Networks (DNNs), by using a loss function more consistent with the evaluation process than the traditional identification losses. However, these methods do not consider the perfo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2022, Vol.30, p.772-784
Hauptverfasser:	Mingote, Victoria, Miguel, Antonio, Ribas, Dayana, Ortega, Alfonso, Lleida, Eduardo
Format:	Artikel
Sprache:	eng
Schlagworte:	aAUC aDCF Artificial Intelligence Artificial neural networks Computer Science Computing costs Cost function cross-entropy Errors Feature extraction loss functions Loss measurement Machine learning Measurement metric learning Neural networks Rejection rate Signal and Image Processing speaker verification Speech processing Task analysis Training triplet loss Verification
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Metric learning approaches have widely expanded to the training of Speaker Verification (SV) systems based on Deep Neural Networks (DNNs), by using a loss function more consistent with the evaluation process than the traditional identification losses. However, these methods do not consider the performance measure and can involve high computational cost, for example, the need for a careful pair or triplet data selection. This paper proposes the approximated Detection Cost Function (aDCF) loss, which is a loss function based on the measure of the decision errors in SV systems, namely the False Rejection Rate (FRR) and the False Acceptance Rate (FAR). With aDCF loss as the training objective function, the end-to-end system learns how to minimize decision errors. Furthermore, we replace the typical linear layer as the last layer of DNN by a cosine distance layer, which reduces the difference between the metric in the training process and the metric during evaluation. aDCF loss function was evaluated in RSR2015-Part I and RSR2015-Part II datasets for text-dependent speaker verification. The system trained with aDCF loss outperforms all the state-of-the-art functions employed in this paper in both parts of the database.
ISSN:	2329-9290 2329-9304
DOI:	10.1109/TASLP.2022.3145307