The IBM Speaker Recognition System: Recent Advances and Error Analysis
We present the recent advances along with an error analysis of the IBM speaker recognition system for conversational speech. Some of the key advancements that contribute to our system include: a nearest-neighbor discriminant analysis (NDA) approach (as opposed to LDA) for intersession variability co...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We present the recent advances along with an error analysis of the IBM
speaker recognition system for conversational speech. Some of the key
advancements that contribute to our system include: a nearest-neighbor
discriminant analysis (NDA) approach (as opposed to LDA) for intersession
variability compensation in the i-vector space, the application of speaker and
channel-adapted features derived from an automatic speech recognition (ASR)
system for speaker recognition, and the use of a DNN acoustic model with a very
large number of output units (~10k senones) to compute the frame-level soft
alignments required in the i-vector estimation process. We evaluate these
techniques on the NIST 2010 SRE extended core conditions (C1-C9), as well as
the 10sec-10sec condition. To our knowledge, results achieved by our system
represent the best performances published to date on these conditions. For
example, on the extended tel-tel condition (C5) the system achieves an EER of
0.59%. To garner further understanding of the remaining errors (on C5), we
examine the recordings associated with the low scoring target trials, where
various issues are identified for the problematic recordings/trials.
Interestingly, it is observed that correcting the pathological recordings not
only improves the scores for the target trials but also for the nontarget
trials. |
---|---|
DOI: | 10.48550/arxiv.1605.01635 |