Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores
The kNN-CTC model has proven to be effective for monolingual automatic speech recognition (ASR). However, its direct application to multilingual scenarios like code-switching, presents challenges. Although there is potential for performance improvement, a kNN-CTC model utilizing a single bilingual d...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The kNN-CTC model has proven to be effective for monolingual automatic speech
recognition (ASR). However, its direct application to multilingual scenarios
like code-switching, presents challenges. Although there is potential for
performance improvement, a kNN-CTC model utilizing a single bilingual datastore
can inadvertently introduce undesirable noise from the alternative language. To
address this, we propose a novel kNN-CTC-based code-switching ASR (CS-ASR)
framework that employs dual monolingual datastores and a gated datastore
selection mechanism to reduce noise interference. Our method selects the
appropriate datastore for decoding each frame, ensuring the injection of
language-specific information into the ASR process. We apply this framework to
cutting-edge CTC-based models, developing an advanced CS-ASR system. Extensive
experiments demonstrate the remarkable effectiveness of our gated datastore
mechanism in enhancing the performance of zero-shot Chinese-English CS-ASR. |
---|---|
DOI: | 10.48550/arxiv.2406.03814 |