Efficient feature extraction and classification for the development of Pashto speech recognition system

In this work, a novel framework for the efficient feature extraction and recognition of Pashto speech signals is proposed. The targeted language is one of the low-resource languages and prone to higher Automatic Speech Recognition (ASR) errors due to the availability of its colloquial dialects. We d...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Multimedia tools and applications 2024-05, Vol.83 (18), p.54081-54096
Hauptverfasser:	Ahmed, Irfan, Irfan, Muhammad Abeer, Iqbal, Abid, Khalil, Amaad, Siddiqui, Salman Ilahi
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Automatic speech recognition Computer Communication Networks Computer Science Data Structures and Information Theory Deep learning Dialects Discrete Wavelet Transform Feature extraction Feature recognition Iranian languages Machine learning Multimedia Information Systems Special Purpose and Application-Based Systems Speech recognition Support vector machines Voice recognition Wavelet transforms
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this work, a novel framework for the efficient feature extraction and recognition of Pashto speech signals is proposed. The targeted language is one of the low-resource languages and prone to higher Automatic Speech Recognition (ASR) errors due to the availability of its colloquial dialects. We devised a framework which not only employed classical Machine Learning (ML) models for speech recognition tasks, but also achieved a higher level of performance accuracy by using the optimal feature extraction techniques. The designed frameworks for feature extraction are based on two well-know feature extraction techniques: Discrete Wavelet Transform (DWT )coefficients and Mel-Frequency Cepstral Coefficients (MFCC). In our work, we deployed classical ML models i.e., Support Vector Machine (SVM) and K-Nearest Neighbors ( k -NN), due to their efficiency in terms of computation complexity, energy efficiency, and higher accuracy as compared to other ML and Deep Learning (DL) model. Hence, our proposed framework exhibited improved performance level when trained on a Pashto isolated words dataset.
ISSN:	1573-7721 1380-7501 1573-7721
DOI:	10.1007/s11042-023-17684-w