An experimental framework for Arabic digits speech recognition in noisy environments
In this paper we present an experimental framework for Arabic isolated digits speech recognition named ARADIGITS-2. This framework provides a performance evaluation of Modern Standard Arabic devoted to a Distributed Speech Recognition system, under noisy environments at various Signal-to-Noise Ratio...
Gespeichert in:
Veröffentlicht in: | International journal of speech technology 2017-06, Vol.20 (2), p.205-224 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper we present an experimental framework for Arabic isolated digits speech recognition named ARADIGITS-2. This framework provides a performance evaluation of Modern Standard Arabic devoted to a Distributed Speech Recognition system, under noisy environments at various Signal-to-Noise Ratio (SNR) levels. The data preparation and the evaluation scripts are designed by deploying a similar methodology to that followed in AURORA-2 database. The original speech data contains a total of 2704 clean utterances, spoken by 112 (56 male and 56 female) Algerian native speakers, down-sampled at 8 kHz. The feature vectors, which consist of a set of Mel Frequency Cepstral Coefficients and log energy, are extracted from speech samples using ETSI Advanced Front-End (ETSI-AFE) standard; whereas, the Hidden Markov Models (HMMs) Toolkit is used for building the speech recognition engine. The recognition task is conducted in speaker-independent mode by considering both word and syllable as acoustic units. Therefore, an optimal fitting of HMM parameters, as well as the temporal derivatives window, is carried out through a series of experiments performed on the two training modes: clean and multi-condition. Better results are obtained by exploiting the polysyllabic nature of Arabic digits. These results show the effectiveness of syllable-like unit in building Arabic digits recognition system, which exceeds word-like unit by an overall Word Accuracy Rate of 0.44 and 0.58% for clean and multi-condition training modes, respectively. |
---|---|
ISSN: | 1381-2416 1572-8110 |
DOI: | 10.1007/s10772-017-9400-x |