Analysis of cosine distance features for speaker verification

•Explores a new feature vector for speaker recognition, referred to as cosine distance feature (CDF).•Analyzes and understands the CDF vector to refine it for further performance improvement.•Defines a meaningful similarity measure between two CDF vectors.•Explores a sparse representation of the CDF...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pattern recognition letters 2018-09, Vol.112, p.285-289
Hauptverfasser: George, Kuruvachan K., Kumar, C. Santhosh, Sivadas, Sunil, Ramachandran, K.I., Panda, Ashish
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Explores a new feature vector for speaker recognition, referred to as cosine distance feature (CDF).•Analyzes and understands the CDF vector to refine it for further performance improvement.•Defines a meaningful similarity measure between two CDF vectors.•Explores a sparse representation of the CDF vector for enhancing its speaker discrimination capability.•Refines CDF vector by introducing speaker specific CDF representation. In this paper, we describe a method for representing the acoustic similarity of a target speaker with respect to a set of known speakers as a feature for speaker verification. We propose a novel distance based representation by encoding the cosine distance between i-vectors of the utterances belonging to target speaker and reference speakers. The new feature is referred to as cosine distance feature (CDF) and is used with a support vector machine (SVM) classifier (CDF-SVM). We show that reference speakers who rank high in acoustic similarity to the target speaker are more important for better speaker discrimination. A sparse representation of the CDF, that retains only a few of the largest values which correspond to the most similar reference speakers in the CDF vector is found to perform better than the baseline CDF system. We also explore speaker specific CDF where each target speaker has specific subset of most acoustically similar reference speakers. We show that the acoustic similarities between the target and reference speakers are best captured using an intersection kernel SVM. Experimental results on the core short2-short3 condition of NIST 2008 SRE, for both female and male trials, show that the speaker specific CDF outperforms the i-vector and speaker independent CDF based state-of-the-art speaker verification systems.
ISSN:0167-8655
1872-7344
DOI:10.1016/j.patrec.2018.08.019