A hybrid input-type recurrent neural network for LVCSR language modeling
Substantial amounts of resources are usually required to robustly develop a language model for an open vocabulary speech recognition system as out-of-vocabulary (OOV) words can hurt recognition accuracy. In this work, we applied a hybrid lexicon of word and sub-word units to resolve the problem of O...
Gespeichert in:
Veröffentlicht in: | EURASIP journal on audio, speech, and music processing speech, and music processing, 2016-08, Vol.2016 (1), p.1, Article 15 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Substantial amounts of resources are usually required to robustly develop a language model for an open vocabulary speech recognition system as out-of-vocabulary (OOV) words can hurt recognition accuracy. In this work, we applied a hybrid lexicon of word and sub-word units to resolve the problem of OOV words in a resource-efficient way. As sub-lexical units can be combined to form new words, a compact set of hybrid vocabulary can be used while still maintaining a low OOV rate. For Thai, a syllable-based unit called pseudo-morpheme (PM) was chosen as a sub-word unit. To also benefit from different levels of linguistic information embedded in different input types, a hybrid recurrent neural network language model (RNNLM) framework is proposed. An RNNLM can model not only information from multiple-type input units through a hybrid input vector of words and PMs, but can also capture long context history through recurrent connections. Several hybrid input representations were also explored to optimize both recognition accuracy and computational time. The hybrid LM has shown to be both resource-efficient and well-performed on two Thai LVCSR tasks: broadcast news transcription and speech-to-speech translation. The proposed hybrid lexicon can constitute an open vocabulary for Thai LVCSR as it can greatly reduce the OOV rate to less than 1 % while using only 42 % of the vocabulary size of the word-based lexicon. In terms of recognition performance, the best proposed hybrid RNNLM, which uses a mixed word-PM input, obtained 1.54 % relative WER reduction when compared with a conventional word-based RNNLM. In terms of computational time, the best hybrid RNNLM has the lowest training and decoding time among all RNNLMs including the word-based RNNLM. The overall relative reduction on WER of the proposed hybrid RNNLM over a traditional n-gram model is 6.91 %. |
---|---|
ISSN: | 1687-4722 1687-4714 1687-4722 |
DOI: | 10.1186/s13636-016-0093-x |