Feature Enhancement with Deep Feature Losses for Speaker Verification
Speaker Verification still suffers from the challenge of generalization to novel adverse environments. We leverage on the recent advancements made by deep learning based speech enhancement and propose a feature-domain supervised denoising based solution. We propose to use Deep Feature Loss which opt...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Speaker Verification still suffers from the challenge of generalization to
novel adverse environments. We leverage on the recent advancements made by deep
learning based speech enhancement and propose a feature-domain supervised
denoising based solution. We propose to use Deep Feature Loss which optimizes
the enhancement network in the hidden activation space of a pre-trained
auxiliary speaker embedding network. We experimentally verify the approach on
simulated and real data. A simulated testing setup is created using various
noise types at different SNR levels. For evaluation on real data, we choose
BabyTrain corpus which consists of children recordings in uncontrolled
environments. We observe consistent gains in every condition over the
state-of-the-art augmented Factorized-TDNN x-vector system. On BabyTrain
corpus, we observe relative gains of 10.38% and 12.40% in minDCF and EER
respectively. |
---|---|
DOI: | 10.48550/arxiv.1910.11905 |