Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization
Conversational bilingual speech encompasses three types of utterances: two purely monolingual types and one intra-sententially code-switched type. In this work, we propose a general framework to jointly model the likelihoods of the monolingual and code-switch sub-tasks that comprise bilingual speech...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Conversational bilingual speech encompasses three types of utterances: two
purely monolingual types and one intra-sententially code-switched type. In this
work, we propose a general framework to jointly model the likelihoods of the
monolingual and code-switch sub-tasks that comprise bilingual speech
recognition. By defining the monolingual sub-tasks with label-to-frame
synchronization, our joint modeling framework can be conditionally factorized
such that the final bilingual output, which may or may not be code-switched, is
obtained given only monolingual information. We show that this conditionally
factorized joint framework can be modeled by an end-to-end differentiable
neural network. We demonstrate the efficacy of our proposed model on bilingual
Mandarin-English speech recognition across both monolingual and code-switched
corpora. |
---|---|
DOI: | 10.48550/arxiv.2111.15016 |