LSTM-LM with Long-Term History for First-Pass Decoding in Conversational Speech Recognition
LSTM language models (LSTM-LMs) have been proven to be powerful and yielded significant performance improvements over count based n-gram LMs in modern speech recognition systems. Due to its infinite history states and computational load, most previous studies focus on applying LSTM-LMs in the second...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | LSTM language models (LSTM-LMs) have been proven to be powerful and yielded
significant performance improvements over count based n-gram LMs in modern
speech recognition systems. Due to its infinite history states and
computational load, most previous studies focus on applying LSTM-LMs in the
second-pass for rescoring purpose. Recent work shows that it is feasible and
computationally affordable to adopt the LSTM-LMs in the first-pass decoding
within a dynamic (or tree based) decoder framework. In this work, the LSTM-LM
is composed with a WFST decoder on-the-fly for the first-pass decoding.
Furthermore, motivated by the long-term history nature of LSTM-LMs, the use of
context beyond the current utterance is explored for the first-pass decoding in
conversational speech recognition. The context information is captured by the
hidden states of LSTM-LMs across utterance and can be used to guide the
first-pass search effectively. The experimental results in our internal meeting
transcription system show that significant performance improvements can be
obtained by incorporating the contextual information with LSTM-LMs in the
first-pass decoding, compared to applying the contextual information in the
second-pass rescoring. |
---|---|
DOI: | 10.48550/arxiv.2010.11349 |