A 168-mW 2.4×-Real-Time 60-kWord Continuous Speech Recognition Processor VLSI

This paper describes a low-power VLSI chip for speaker-independent 60-kWord continuous speech recognition based on a context-dependent Hidden Markov Model (HMM). It features a compression-decoding scheme to reduce the external memory bandwidth for Gaussian Mixture Model (GMM) computation and multi-p...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEICE Transactions on Electronics 2013/04/01, Vol.E96.C(4), pp.444-453
Hauptverfasser:	HE, Guangji, SUGAHARA, Takanobu, MIYAMOTO, Yuki, IZUMI, Shintaro, KAWAGUCHI, Hiroshi, YOSHIMOTO, Masahiko
Format:	Artikel
Sprache:	eng
Schlagworte:	40nm VLSI Accuracy Chips hidden Markov model (HMM) large vocabulary continuous recognition (LVCSR) Mathematical models Power consumption Real time Reduction Speech recognition Very large scale integration
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper describes a low-power VLSI chip for speaker-independent 60-kWord continuous speech recognition based on a context-dependent Hidden Markov Model (HMM). It features a compression-decoding scheme to reduce the external memory bandwidth for Gaussian Mixture Model (GMM) computation and multi-path Viterbi transition units. We optimize the internal SRAM size using the max-approximation GMM calculation and adjusting the number of look-ahead frames. The test chip, fabricated in 40nm CMOS technology, occupies 1.77mm × 2.18mm containing 2.52 M transistors for logic and 4.29Mbit on-chip memory. The measured results show that our implementation achieves 34.2% required frequency reduction (83.3MHz), 48.5% power consumption reduction (74.14mW) for 60 k-Word real-time continuous speech recognition compared to the previous work while 30% of the area is saved with recognition accuracy of 90.9%. This chip can maximally process 2.4× faster than real-time at 200MHz and 1.1V with power consumption of 168mW. By increasing the beam width, better recognition accuracy (91.45%) can be achieved. In that case, the power consumption for real-time processing is increased to 97.4mW and the max-performance is decreased to 2.08× because of the increased computation workload.
ISSN:	0916-8524 1745-1353
DOI:	10.1587/transele.E96.C.444