Reducing computational and memory cost for cellular phone embedded speech recognition system

We present several methods able to fit speech recognition system requirements to cellular phone resources. The proposed techniques are evaluated on a digit recognition task using both French and English corpora. We investigate particularly three aspects of speech processing: acoustic parameterizatio...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Levy, C., Linares, G., Nocera, P., Bonastre, J.-F.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We present several methods able to fit speech recognition system requirements to cellular phone resources. The proposed techniques are evaluated on a digit recognition task using both French and English corpora. We investigate particularly three aspects of speech processing: acoustic parameterization, recognition algorithms; acoustic modeling. Several parameterization algorithms (LPCC, MFCC and PLP) are compared to the linear predictive coding (LPC) included in the GSM norm. The MFCC and PLP parameterization algorithms perform significantly better than the others. Moreover, feature vector size can be reduced to 6 PLP coefficients, allowing memory and computation resources to be decreased without a significant loss of performance. In order to achieve good performance with reasonable resource needs, we develop several methods to embed a classical HMM-based speech recognition system in a cellular phone. We first propose an automatic on-line building of a phonetic lexicon which allows a minimal but unlimited lexicon. Then we reduce the HMM complexity by decreasing the number of (Gaussian) components per state. Finally, we evaluate our propositions by comparing dynamic time warping (DTW) with our HMM system - in the cellular phone context - for clean conditions. The experiments show that our HMM system outperforms DTW for speaker independent tasks and allows more practical applications for the cellular-phone user interface.
ISSN:1520-6149
2379-190X
DOI:10.1109/ICASSP.2004.1327109