An experimental framework for Arabic digits speech recognition in noisy environments

In this paper we present an experimental framework for Arabic isolated digits speech recognition named ARADIGITS-2. This framework provides a performance evaluation of Modern Standard Arabic devoted to a Distributed Speech Recognition system, under noisy environments at various Signal-to-Noise Ratio...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of speech technology 2017-06, Vol.20 (2), p.205-224
Hauptverfasser: Touazi, Azzedine, Debyeche, Mohamed
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this paper we present an experimental framework for Arabic isolated digits speech recognition named ARADIGITS-2. This framework provides a performance evaluation of Modern Standard Arabic devoted to a Distributed Speech Recognition system, under noisy environments at various Signal-to-Noise Ratio (SNR) levels. The data preparation and the evaluation scripts are designed by deploying a similar methodology to that followed in AURORA-2 database. The original speech data contains a total of 2704 clean utterances, spoken by 112 (56 male and 56 female) Algerian native speakers, down-sampled at 8 kHz. The feature vectors, which consist of a set of Mel Frequency Cepstral Coefficients and log energy, are extracted from speech samples using ETSI Advanced Front-End (ETSI-AFE) standard; whereas, the Hidden Markov Models (HMMs) Toolkit is used for building the speech recognition engine. The recognition task is conducted in speaker-independent mode by considering both word and syllable as acoustic units. Therefore, an optimal fitting of HMM parameters, as well as the temporal derivatives window, is carried out through a series of experiments performed on the two training modes: clean and multi-condition. Better results are obtained by exploiting the polysyllabic nature of Arabic digits. These results show the effectiveness of syllable-like unit in building Arabic digits recognition system, which exceeds word-like unit by an overall Word Accuracy Rate of 0.44 and 0.58% for clean and multi-condition training modes, respectively.
ISSN:1381-2416
1572-8110
DOI:10.1007/s10772-017-9400-x