Single Channel Far Field Feature Enhancement For Speaker Verification In The Wild
We investigated an enhancement and a domain adaptation approach to make speaker verification systems robust to perturbations of far-field speech. In the enhancement approach, using paired (parallel) reverberant-clean speech, we trained a supervised Generative Adversarial Network (GAN) along with a f...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We investigated an enhancement and a domain adaptation approach to make
speaker verification systems robust to perturbations of far-field speech. In
the enhancement approach, using paired (parallel) reverberant-clean speech, we
trained a supervised Generative Adversarial Network (GAN) along with a feature
mapping loss. For the domain adaptation approach, we trained a Cycle Consistent
Generative Adversarial Network (CycleGAN), which maps features from far-field
domain to the speaker embedding training domain. This was trained on unpaired
data in an unsupervised manner. Both networks, termed Supervised Enhancement
Network (SEN) and Domain Adaptation Network (DAN) respectively, were trained
with multi-task objectives in (filter-bank) feature domain. On a simulated test
setup, we first note the benefit of using feature mapping (FM) loss along with
adversarial loss in SEN. Then, we tested both supervised and unsupervised
approaches on several real noisy datasets. We observed relative improvements
ranging from 2% to 31% in terms of DCF. Using three training schemes, we also
establish the effectiveness of the novel DAN approach. |
---|---|
DOI: | 10.48550/arxiv.2005.08331 |