Scalable architecture for word HMM-based speech recognition and VLSI implementation in complete system

This paper describes a scalable architecture for real-time speech recognizers based on word hidden Markov models (HMMs) that provide high recognition accuracy for word recognition tasks. However, the size of their recognition vocabulary is small because its extremely high computational costs cause l...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems. 1, Fundamental theory and applications Fundamental theory and applications, 2006-01, Vol.53 (1), p.70-77
Hauptverfasser:	Yoshizawa, S., Wada, N., Hayasaka, N., Miyanaga, Y.
Format:	Artikel
Sprache:	eng
Schlagworte:	Architecture CMOS Computational efficiency Computer architecture Concurrent computing Hidden Markov model (HMM) Hidden Markov models Integrated circuits Libraries Mathematical models Noise robustness Recognition scalable architecture Speech Speech analysis Speech recognition Very large scale integration VLSI implementation Vocabulary
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper describes a scalable architecture for real-time speech recognizers based on word hidden Markov models (HMMs) that provide high recognition accuracy for word recognition tasks. However, the size of their recognition vocabulary is small because its extremely high computational costs cause long processing times. To achieve high-speed operations, we developed a VLSI system that has a scalable architecture. The architecture effectively uses parallel computations on the word HMM structure. It can reduce processing time and/or extend the word vocabulary. To explore the practicality of our architecture, we designed and evaluated a complete system recognizer, including speech analysis and noise robustness parts, on a 0.18-/spl mu/m CMOS standard cell library and field-programmable gate array. In the CMOS standard-cell implementation, the total processing time is 56.9 /spl mu/s/word at an operating frequency of 80 MHz in a single system. The recognizer gives a real-time response using an 800-word vocabulary.
ISSN:	1549-8328 1057-7122 1558-0806
DOI:	10.1109/TCSI.2005.854408