Text-dependent and text-independent speaker recognition of reverberant speech based on CNN

Speaker recognition is one of several biometric recognition systems owing to its high importance in numerous applications of security and telecommunications. The key aspiration of speaker recognition systems is to know who is speaking depending on voice characteristics. This paper presents an extens...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of speech technology 2021-12, Vol.24 (4), p.993-1006
Hauptverfasser:	El-Moneim, Samia Abd, Sedik, Ahmed, Nassar, M. A., El-Fishawy, Adel S., Sharshar, A. M., Hassan, Shaimaa E. A., Mahmoud, Adel Zaghloul, Dessouky, Moawd I., El-Banby, Ghada M., El-Samie, Fathi E. Abd, El-Rabaie, El-Sayed M., Neyazi, Badawi, Seddeq, H. S., Ismail, Nabil A., Khalaf, Ashraf A. M., Elabyad, G. S. M.
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Artificial neural networks Aspiration Biometric recognition systems Engineering Feature extraction Image processing Mass media Object recognition Reverberation Signal processing Signal,Image and Speech Processing Social Sciences Speaker identification Spectrograms Speech recognition Voice recognition
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Speaker recognition is one of several biometric recognition systems owing to its high importance in numerous applications of security and telecommunications. The key aspiration of speaker recognition systems is to know who is speaking depending on voice characteristics. This paper presents an extensive study of speaker recognition in both text-dependent and text-independent cases. Convolutional Neural Network (CNN) based feature extraction is extended to the text-dependent and text-independent speaker recognition tasks. In addition, the effect of reverberation on the speaker recognition system is addressed. All speech signals are converted into images by obtaining their spectrograms. Two proposed CNN models are presented for efficient speaker recognition from clean and reverberant speech signals. They depend on image processing concepts applied on spectrograms of speech signals. One of the proposed models is compared with a conventional Benchmark model in the text-independent scenario. The performance of the recognition system is measured by the recognition rate in the cases of clean and reverberant speech.
ISSN:	1381-2416 1572-8110
DOI:	10.1007/s10772-021-09805-3