Hybrid Feature Optimization for Voice Spoof Detection Using CNN-LSTM

The objective of this work develops an Automatic Speaker Verification (ASV) system to discern genuine from spoof speech samples. The speech sample features are extracted using Mel-frequency Cepstral Coefficients (MFCC), Constant Q Cepstral Coefficients (CQCC), and Spectrogram feature extraction tech...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Traitement du signal 2024-04, Vol.41 (2), p.717-727
Hauptverfasser: Neelima, Medikonda, Prabha, I Santi
Format: Artikel
Sprache:eng ; fre
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The objective of this work develops an Automatic Speaker Verification (ASV) system to discern genuine from spoof speech samples. The speech sample features are extracted using Mel-frequency Cepstral Coefficients (MFCC), Constant Q Cepstral Coefficients (CQCC), and Spectrogram feature extraction techniques. MFCC, CQCC, and Spectrogram feature extraction are the most common feature extraction techniques in detecting spoofs in voice samples. However, for detecting voice spoofing using these techniques there is a requirement to improve the accuracy. To improve the accuracy a novel hybrid feature extraction technique is proposed. In this present work, the hybrid features are generated by combining relevant features from the three mentioned feature extraction techniques. These extracted features of the speech samples are fed to the new fused Convolution Neural Network (CNN) model and LSTM Neural Network to improve the performance of the overall system. The data set for evaluating the system is split into training and testing samples. New CNN with LSTM model trains training samples. After completing the training phase, the model is evaluated for testing samples. This work aims to extract the features using all three mentioned and also the generated hybrid feature extraction techniques. The performance of the new CNN with the LSTM model is evaluated through a confusion matrix and ROC curve. Comparing one among all feature extraction techniques, the generated hybrid feature extraction technique provides a better test accuracy of 98.48% and a low Equal Error Rate (EER) of 2.2%. In the end, the new CNN-LSTM architecture achieved the lowest EER among all feature extraction techniques thanks to the hybrid feature extraction approach.
ISSN:0765-0019
1958-5608
DOI:10.18280/ts.410214