Decoupling recognition and transcription in Mandarin ASR
Much of the recent literature on automatic speech recognition (ASR) is taking an end-to-end approach. Unlike English where the writing system is closely related to sound, Chinese characters (Hanzi) represent meaning, not sound. We propose factoring audio -> Hanzi into two sub-tasks: (1) audio -&g...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Much of the recent literature on automatic speech recognition (ASR) is taking
an end-to-end approach. Unlike English where the writing system is closely
related to sound, Chinese characters (Hanzi) represent meaning, not sound. We
propose factoring audio -> Hanzi into two sub-tasks: (1) audio -> Pinyin and
(2) Pinyin -> Hanzi, where Pinyin is a system of phonetic transcription of
standard Chinese. Factoring the audio -> Hanzi task in this way achieves 3.9%
CER (character error rate) on the Aishell-1 corpus, the best result reported on
this dataset so far. |
---|---|
DOI: | 10.48550/arxiv.2108.01129 |