Two-level Noise Robust and Block Featured PNN Model for Speaker Recognition in Real Environment

Speaker recognition is gaining popularity in a device and application-specific verification and validation to avoid complex textual passwords and keep remembering them. Various devices and applications have adapted speaker-based verification to ensure online and offline access. However, speaker reco...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Wireless personal communications 2022, Vol.125 (4), p.3741-3771
1. Verfasser: Juneja, Kapil
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Speaker recognition is gaining popularity in a device and application-specific verification and validation to avoid complex textual passwords and keep remembering them. Various devices and applications have adapted speaker-based verification to ensure online and offline access. However, speaker recognition is also affected by multiple devices and environment-specific disturbances. In this paper, the Two-level noise-robust PNN model (2LNR-PNN) is presented for the significant recognition of the speaker. The noise is handled during the pre-processing level and the featureset generation stage. The high-level noise and situational turbulence were addressed in this work using spectral subtraction and the GMM method. This rectified noise is processed under frequency and window-based computation to extract the MFCC, LPC, and statistical features. This composite featureset is processed under Probabilistic Neural Network (PNN) for identifying the speaker. The proposed model has experimented on THUYG-20 SRE Corpus and self-collected real-time dataset. The separate experiments are conducted in different noise conditions with car, fan, white, cafeteria and babble noises. The experiments are validated against various feature processors, machine learning and deep learning models. The analytical observations are collected using accuracy, EER and FRR measures. The proposed model claims an average accuracy of over 80% and a maximum FRR of 0.2 in varied noises with 1db, 5db and 9db SNR conditions. The proposed model outperformed the experimented machine learning and deep learning models with a significant performance gain.
ISSN:0929-6212
1572-834X
DOI:10.1007/s11277-022-09734-7