Speaker Verification Based on Channel Attention and Adaptive Joint Loss

In deep learning-based speaker verification, the loss function plays a crucial role. Most systems rely on a single loss function, or simply sum multiple losses with manually adjusted weights, increasing experimental complexity and failing to fully leverage the complementary characteristics of differ...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Electronics (Basel) 2025-01, Vol.14 (3), p.548
Hauptverfasser: Fan, Houbin, Li, Jun, Ge, Fengpei, Liang, Chunyan
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In deep learning-based speaker verification, the loss function plays a crucial role. Most systems rely on a single loss function, or simply sum multiple losses with manually adjusted weights, increasing experimental complexity and failing to fully leverage the complementary characteristics of different losses. To address this, this paper proposes a speaker verification system based on channel attention and adaptive joint loss optimization. An adaptive joint loss function dynamically adjusts loss weights, allowing the model to better learn the similarities and differences of speakers, narrowing the gap between closed- and open-set testing, and enhancing generalization ability. A channel attention squeeze-and-excitation module is designed to improve the network’s ability to extract channel-specific features. On the AISHELL-1 dataset, the system achieved an equal error rate of 0.84% and a minimum detection cost function of 0.0528. Experimental results demonstrate a significant improvement in speaker verification performance, confirming the effectiveness of the proposed system.
ISSN:2079-9292
2079-9292
DOI:10.3390/electronics14030548