A robust accent classification system based on variational mode decomposition

State-of-the-art automatic speech recognition models often struggle to capture nuanced features inherent in accented speech, leading to sub-optimal performance in speaker recognition based on regional accents. Despite substantial progress in the field of automatic speech recognition, ensuring robust...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Engineering applications of artificial intelligence 2025-01, Vol.139, p.109512, Article 109512
Hauptverfasser:	Subhash, Darshana, G., Jyothish Lal, B., Premjith, Ravi, Vinayakumar
Format:	Artikel
Sprache:	eng
Schlagworte:	Accent classification Automatic speech recognition Deep learning Machine learning Mel-frequency cepstral coefficients Variational mode decomposition
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	State-of-the-art automatic speech recognition models often struggle to capture nuanced features inherent in accented speech, leading to sub-optimal performance in speaker recognition based on regional accents. Despite substantial progress in the field of automatic speech recognition, ensuring robustness to accents and generalization across dialects remains a persistent challenge, particularly in real-time settings. In response, this study introduces a novel approach leveraging Variational Mode Decomposition (VMD) to enhance accented speech signals, aiming to mitigate noise interference and improve generalization on unseen accented speech datasets. Our method employs decomposed modes of the VMD algorithm for signal reconstruction, followed by feature extraction using Mel-Frequency Cepstral Coefficients (MFCC). These features are subsequently classified using machine learning models such as 1D Convolutional Neural Network (1D-CNN), Support Vector Machine (SVM), Random Forest, and Decision Trees, as well as a deep learning model based on a 2D Convolutional Neural Network (2D-CNN). Experimental results demonstrate superior performance, with the SVM classifier achieving an accuracy of approximately 87.5% on a standard dataset and 99.3% on the AccentBase dataset. The 2D-CNN model further improves the results in multi-class accent classification tasks. This research contributes to advancing automatic speech recognition robustness and accent-inclusive speaker recognition, addressing critical challenges in real-world applications.
ISSN:	0952-1976
DOI:	10.1016/j.engappai.2024.109512