Reducing computational and memory cost for cellular phone embedded speech recognition system

We present several methods able to fit speech recognition system requirements to cellular phone resources. The proposed techniques are evaluated on a digit recognition task using both French and English corpora. We investigate particularly three aspects of speech processing: acoustic parameterizatio...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Levy, C., Linares, G., Nocera, P., Bonastre, J.-F.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Applied sciences Cellular phones Coding, codes Computational efficiency Computer Science Embedded computing Equipments and installations Exact sciences and technology GSM Hidden Markov models Information, signal and communications theory Linear predictive coding Mel frequency cepstral coefficient Mobile radiocommunication systems Radiocommunications Services and terminals of telecommunications Signal and communications theory Signal processing Speech processing Speech recognition Systems, networks and services of telecommunications Telecommunications Telecommunications and information theory Telephone. Videophone Vectors
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We present several methods able to fit speech recognition system requirements to cellular phone resources. The proposed techniques are evaluated on a digit recognition task using both French and English corpora. We investigate particularly three aspects of speech processing: acoustic parameterization, recognition algorithms; acoustic modeling. Several parameterization algorithms (LPCC, MFCC and PLP) are compared to the linear predictive coding (LPC) included in the GSM norm. The MFCC and PLP parameterization algorithms perform significantly better than the others. Moreover, feature vector size can be reduced to 6 PLP coefficients, allowing memory and computation resources to be decreased without a significant loss of performance. In order to achieve good performance with reasonable resource needs, we develop several methods to embed a classical HMM-based speech recognition system in a cellular phone. We first propose an automatic on-line building of a phonetic lexicon which allows a minimal but unlimited lexicon. Then we reduce the HMM complexity by decreasing the number of (Gaussian) components per state. Finally, we evaluate our propositions by comparing dynamic time warping (DTW) with our HMM system - in the cellular phone context - for clean conditions. The experiments show that our HMM system outperforms DTW for speaker independent tasks and allows more practical applications for the cellular-phone user interface.
ISSN:	1520-6149 2379-190X
DOI:	10.1109/ICASSP.2004.1327109