On-the-fly Text Retrieval for End-to-End ASR Adaptation
End-to-end speech recognition models are improved by incorporating external text sources, typically by fusion with an external language model. Such language models have to be retrained whenever the corpus of interest changes. Furthermore, since they store the entire corpus in their parameters, rare...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | End-to-end speech recognition models are improved by incorporating external
text sources, typically by fusion with an external language model. Such
language models have to be retrained whenever the corpus of interest changes.
Furthermore, since they store the entire corpus in their parameters, rare words
can be challenging to recall. In this work, we propose augmenting a
transducer-based ASR model with a retrieval language model, which directly
retrieves from an external text corpus plausible completions for a partial ASR
hypothesis. These completions are then integrated into subsequent predictions
by an adapter, which is trained once, so that the corpus of interest can be
switched without incurring the computational overhead of retraining. Our
experiments show that the proposed model significantly improves the performance
of a transducer baseline on a pair of question-answering datasets. Further, it
outperforms shallow fusion on recognition of named entities by about 7
relative; when the two are combined, the relative improvement increases to 13%. |
---|---|
DOI: | 10.48550/arxiv.2303.10942 |