Phoneme-aware Encoding for Prefix-tree-based Contextual ASR
In speech recognition applications, it is important to recognize context-specific rare words, such as proper nouns. Tree-constrained Pointer Generator (TCPGen) has shown promise for this purpose, which efficiently biases such words with a prefix tree. While the original TCPGen relies on grapheme-bas...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In speech recognition applications, it is important to recognize
context-specific rare words, such as proper nouns. Tree-constrained Pointer
Generator (TCPGen) has shown promise for this purpose, which efficiently biases
such words with a prefix tree. While the original TCPGen relies on
grapheme-based encoding, we propose extending it with phoneme-aware encoding to
better recognize words of unusual pronunciations. As TCPGen handles biasing
words as subword units, we propose obtaining subword-level phoneme-aware
encoding by using alignment between phonemes and subwords. Furthermore, we
propose injecting phoneme-level predictions from CTC into queries of TCPGen so
that the model better interprets the phoneme-aware encodings. We conducted ASR
experiments with TCPGen for RNN transducer. We observed that proposed
phoneme-aware encoding outperformed ordinary grapheme-based encoding on both
the English LibriSpeech and Japanese CSJ datasets, demonstrating the robustness
of our approach across linguistically diverse languages. |
---|---|
DOI: | 10.48550/arxiv.2312.09582 |