Denoised Senone I-Vectors for Robust Speaker Verification
Recently, it has been shown that senone i-vectors, whose posteriors are produced by senone deep neural networks (DNNs), outperform the conventional Gaussian mixture model (GMM) i-vectors in both speaker and language recognition tasks. The success of senone i-vectors relies on the capability of the D...
Gespeichert in:
Veröffentlicht in: | IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2018-04, Vol.26 (4), p.820-830 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recently, it has been shown that senone i-vectors, whose posteriors are produced by senone deep neural networks (DNNs), outperform the conventional Gaussian mixture model (GMM) i-vectors in both speaker and language recognition tasks. The success of senone i-vectors relies on the capability of the DNN to incorporate phonetic information into the i-vector extraction process. In this paper, we argue that to apply senone i-vectors in noisy environments, it is important to robustify the phonetically discriminative acoustic features and senone posteriors estimated by the DNN. To this end, we propose a deep architecture formed by stacking a deep belief network on top of a denoising autoencoder (DAE). After backpropagation fine-tuning, the network, referred to as denoising autoencoder-deep neural network (DAE-DNN), facilitates the extraction of robust phonetically-discriminitive bottleneck (BN) features and senone posteriors for i-vector extraction. We refer to the resulting i-vectors as denoised BN-based senone i-vectors. Results on NIST 2012 SRE show that senone i-vectors outperform the conventional GMM i-vectors. More interestingly, the BN features are not only phonetically discriminative, results suggest that they also contain sufficient speaker information to produce BN-based senone i-vectors that outperform the conventional senone i-vectors. This work also shows that DAE training is more beneficial to BN feature extraction than senone posterior estimation. |
---|---|
ISSN: | 2329-9290 2329-9304 |
DOI: | 10.1109/TASLP.2018.2796843 |