Probabilistic Latent Prosody Analysis for Robust Speaker Verification

In this investigation, two probabilistic latent semantic analyses (PLSA)-based approaches are proposed for use in speaker verification systems to reduce the number of parameters required by prosodic speaker models to (1) estimate reliably speakers' bi-gram models and to (2) reduce the amount of...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zi-He Chen, Zhi-Ren Zeng, Yuan-Fu Liao, Yau-Tarng Juang
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Data engineering Electronic equipment testing Hidden Markov models NIST Reliability engineering Robustness Speaker recognition Speech analysis System testing Telephone sets
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this investigation, two probabilistic latent semantic analyses (PLSA)-based approaches are proposed for use in speaker verification systems to reduce the number of parameters required by prosodic speaker models to (1) estimate reliably speakers' bi-gram models and to (2) reduce the amount of required training and test data. The basic concept is to (1) adopt PLSA to smooth the underlying n-gram-based prosodic speaker models, and to (2) use PLSA to find a compact latent prosody space to represent efficiently the constellation of speakers. The proposed approaches are evaluated on the standard single-speaker detection task of the 2001 NIST Speaker Recognition Evaluation Corpus, where only one 2 minute training enrollment speech and 30 s test speech on average are available. Experimental results demonstrated that the proposed approach can reduce the required number of bi-gram parameters from 112 to 88 and 63 per speaker and improve the EERs of MAP-GMM and GMM+T-norm from 12.4% and 9.5% to 10.4% and 8.4%, respectively, and finally to 8.1% after fusing all systems
ISSN:	1520-6149 2379-190X
DOI:	10.1109/ICASSP.2006.1659968