Time Series-Based Spoof Speech Detection Using Long Short-Term Memory and Bidirectional Long Short-Term Memory

Detecting fake speech in voice-based authentication systems is crucial for reliability. Traditional methods often struggle because they can't handle the complex patterns over time. Our study introduces an advanced approach using deep learning, specifically Long Short-Term Memory (LSTM) and Bidi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ARO (Koya) 2024-09, Vol.12 (2), p.119-129
Hauptverfasser:	Mirza, Arsalan R., Al-Talabani, Abdulbasit K.
Format:	Artikel
Sprache:	eng
Schlagworte:	Bidirectional Long Short-Term Memory Constant Q cepstral coefficients Countermeasure Spoofing Long Short-Term Memory Mel-frequency cepstral coefficients Open-source speech and music interpretation by large-space extraction
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Detecting fake speech in voice-based authentication systems is crucial for reliability. Traditional methods often struggle because they can't handle the complex patterns over time. Our study introduces an advanced approach using deep learning, specifically Long Short-Term Memory (LSTM) and Bidirectional LSTM (BiLSTM) models, tailored for identifying fake speech based on its temporal characteristics. We use speech signals with cepstral features like Mel-frequency cepstral coefficients (MFCC), Constant Q cepstral coefficients (CQCC), and open-source Speech and Music Interpretation by Large-space Extraction (OpenSMILE) to directly learn these patterns. Testing on the ASVspoof 2019 Logical Access dataset, we focus on metrics such as min-tDCF, Equal Error Rate (EER), Recall, Precision, and F1-score. Our results show that LSTM and BiLSTM models significantly enhance the reliability of spoof speech detection systems.
ISSN:	2410-9355 2307-549X
DOI:	10.14500/aro.11636