A Language Agnostic Multilingual Streaming On-Device ASR System
On-device end-to-end (E2E) models have shown improvements over a conventional model on English Voice Search tasks in both quality and latency. E2E models have also shown promising results for multilingual automatic speech recognition (ASR). In this paper, we extend our previous capacity solution to...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | On-device end-to-end (E2E) models have shown improvements over a conventional
model on English Voice Search tasks in both quality and latency. E2E models
have also shown promising results for multilingual automatic speech recognition
(ASR). In this paper, we extend our previous capacity solution to streaming
applications and present a streaming multilingual E2E ASR system that runs
fully on device with comparable quality and latency to individual monolingual
models. To achieve that, we propose an Encoder Endpointer model and an
End-of-Utterance (EOU) Joint Layer for a better quality and latency trade-off.
Our system is built in a language agnostic manner allowing it to natively
support intersentential code switching in real time. To address the feasibility
concerns on large models, we conducted on-device profiling and replaced the
time consuming LSTM decoder with the recently developed Embedding decoder. With
these changes, we managed to run such a system on a mobile device in less than
real time. |
---|---|
DOI: | 10.48550/arxiv.2208.13916 |