Analysis and extraction of LP-residual for its application in speaker verification system under uncontrolled noisy environment

Sub-segmental analysis of excitation source may contain significant speaker-specific information pertaining to speaker verification. In this paper, the excitation source feature has been explored for design of speaker verification system (SVS). The baseline of the system is extraction of speaker-spe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Multimedia tools and applications 2017, Vol.76 (1), p.757-784
Hauptverfasser: Misra, Songhita, Laskar, Rabul Hussain, Baruah, U., Das, T. K., Saha, P., Choudhury, S. P.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Sub-segmental analysis of excitation source may contain significant speaker-specific information pertaining to speaker verification. In this paper, the excitation source feature has been explored for design of speaker verification system (SVS). The baseline of the system is extraction of speaker-specific information from LP-residual features by modelling the speakers through different supervised and unsupervised models, based on which they will be accepted or rejected. Direct LP-residual (DLR) as well as DCT coefficients of LP-residual (DCTLR) are approximated as the excitation source features for the system. The models are processed in two different level of analysis, namely, sentence level analysis as well as voice-segment level approach (VSLA), with the variations in the frame size of the input. Effects of the change of frame size in the input vectors are observed. Studies are carried over telephonic database collected in practical environment. A comparative analysis has been presented for the combination of models, features and the two levels of analysis for the given data. The experimental study suggests that application of VSLA on unsupervised models with DCTLR as input, provides a performance which is 14.21 % better than sentence level analysis of the models.
ISSN:1380-7501
1573-7721
DOI:10.1007/s11042-015-3020-8