Unsupervised Phonetic Segmentation of Classical Arabic Speech Using Forward and Inverse Characteristics of the Vocal Tract

Automatic segmentation of speech is about identifying boundaries of phonemes in a given utterance. This paper presents a strategy driven by cosine distance similarity scores for identifying phoneme boundaries. The proposed strategy helps in the selection of appropriate feature extraction technique f...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Arabian journal for science and engineering (2011) 2020-03, Vol.45 (3), p.1581-1597
Hauptverfasser:	Javed, Muhammad, Baig, Mirza Muhammad Ali, Qazi, Saad Ahmed
Format:	Artikel
Sprache:	eng
Schlagworte:	Alignment Engineering Errors Feature extraction Humanities and Social Sciences Markov chains multidisciplinary Phonemes Research Article - Electrical Engineering Science Segmentation Speech processing Speech recognition Vocal tract
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Automatic segmentation of speech is about identifying boundaries of phonemes in a given utterance. This paper presents a strategy driven by cosine distance similarity scores for identifying phoneme boundaries. The proposed strategy helps in the selection of appropriate feature extraction technique for speech segmentation applications. After assessing various state-of-the-art speech processing techniques, a novel combination of forward and inverse characteristics of vocal tract (FICV) is developed. The proposed technique is evaluated on Classical Arabic dataset. Extensive experiments are made to compare the proposed technique with state-of-the-art techniques, including the hidden Markov model-based forced alignment procedures. The results show that proposed technique has total error rate of 14.48%, while the accuracy is 85.2% within 10 ms alignment error. When compared with the existing state-of-the-art technique, the proposed technique outperforms by 12.29% and 22.73% in terms of error rates and alignment accuracies, respectively, which signifies the potential of using FICV in speech segmentation.
ISSN:	2193-567X 1319-8025 2191-4281
DOI:	10.1007/s13369-019-04065-5