Investigating Long-Term and Short-Term Time-Varying Speaker Verification

The performance of speaker verification systems can be adversely affected by time domain variations. However, limited research has been conducted on time-varying speaker verification due to the absence of appropriate datasets. This paper aims to investigate the impact of long-term and short-term tim...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2024, Vol.32, p.3408-3423
Hauptverfasser: Qin, Xiaoyi, Li, Na, Duan, Shufei, Li, Ming
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The performance of speaker verification systems can be adversely affected by time domain variations. However, limited research has been conducted on time-varying speaker verification due to the absence of appropriate datasets. This paper aims to investigate the impact of long-term and short-term time-varying in speaker verification and proposes solutions to mitigate these effects. For long-term speaker verification (i.e., cross-age speaker verification), we introduce an age-decoupling adversarial learning method to learn age-invariant speaker representation by mining age information from the VoxCeleb dataset. For short-term speaker verification, we collect the SMIIP-TimeVarying (SMIIP-TV) Dataset, which includes recordings at multiple time slots every day from 373 speakers for 90 consecutive days and other relevant meta information. Using this dataset, we analyze the time-varying of speaker embeddings and propose a novel but realistic time-varying speaker verification task, termed incremental sequence-pair speaker verification. This task involves continuous interaction between enrollment audios and a sequence of testing audios with the aim of improving performance over time. We introduce the template updating method to counter the negative effects over time, and then formulate the template updating processing as a Markov Decision Process and propose a template updating method based on deep reinforcement learning (DRL). The policy network of DRL is treated as an agent to determine if and how much should the template be updated. In summary, this paper releases our collected database, investigates both the long-term and short-term time-varying scenarios and provides insights and solutions into time-varying speaker verification.
ISSN:2329-9290
2329-9304
DOI:10.1109/TASLP.2024.3428910