Towards Maximum Likelihood Training for Transducer-Based Streaming Speech Recognition

Transducer neural networks have emerged as the mainstream approach for streaming automatic speech recognition (ASR), offering state-of-the-art performance in balancing accuracy and latency. In the conventional framework, streaming transducer models are trained to maximize the likelihood function bas...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE signal processing letters 2025, Vol.32, p.26-30
Hauptverfasser:	Lee, Hyeonseung, Yoon, Ji Won, Kim, Sungsoo, Kim, Nam Soo
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Automatic speech recognition automatic speech recognition (ASR) Bayes methods Context modeling Deformable models Dynamic programming Inference Mathematical models Maximum likelihood estimation Network latency Neural networks Recursion RNN-transducer (RNN-T) Speech recognition streaming ASR Training transducer neural network Transducers Voice recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Transducer neural networks have emerged as the mainstream approach for streaming automatic speech recognition (ASR), offering state-of-the-art performance in balancing accuracy and latency. In the conventional framework, streaming transducer models are trained to maximize the likelihood function based on non-streaming recursion rules. However, this approach leads to a mismatch between training and inference, resulting in the issue of deformed likelihood and consequently suboptimal ASR accuracy. We introduce a mathematical quantification of the gap between the actual likelihood and the deformed likelihood, namely forward variable causal compensation (FoCC). We also present its estimator, FoCCE, as a solution to estimate the exact likelihood. Through experiments on the LibriSpeech dataset, we show that FoCCE training improves the accuracy of the streaming transducers.
ISSN:	1070-9908 1558-2361
DOI:	10.1109/LSP.2024.3491019