Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition
We present results that show it is possible to build a competitive, greatly simplified, large vocabulary continuous speech recognition system with whole words as acoustic units. We model the output vocabulary of about 100,000 words directly using deep bi-directional LSTM RNNs with CTC loss. The mode...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We present results that show it is possible to build a competitive, greatly
simplified, large vocabulary continuous speech recognition system with whole
words as acoustic units. We model the output vocabulary of about 100,000 words
directly using deep bi-directional LSTM RNNs with CTC loss. The model is
trained on 125,000 hours of semi-supervised acoustic training data, which
enables us to alleviate the data sparsity problem for word models. We show that
the CTC word models work very well as an end-to-end all-neural speech
recognition model without the use of traditional context-dependent sub-word
phone units that require a pronunciation lexicon, and without any language
model removing the need to decode. We demonstrate that the CTC word models
perform better than a strong, more complex, state-of-the-art baseline with
sub-word units. |
---|---|
DOI: | 10.48550/arxiv.1610.09975 |