The USTC-NERCSLIP Systems for The ICMC-ASR Challenge
This report describes the submitted system to the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) challenge, which considers the ASR task with multi-speaker overlapping and Mandarin accent dynamics in the ICMC case. We implement the front-end speaker diarization using the self-supervise...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This report describes the submitted system to the In-Car Multi-Channel
Automatic Speech Recognition (ICMC-ASR) challenge, which considers the ASR task
with multi-speaker overlapping and Mandarin accent dynamics in the ICMC case.
We implement the front-end speaker diarization using the self-supervised
learning representation based multi-speaker embedding and beamforming using the
speaker position, respectively. For ASR, we employ an iterative pseudo-label
generation method based on fusion model to obtain text labels of unsupervised
data. To mitigate the impact of accent, an Accent-ASR framework is proposed,
which captures pronunciation-related accent features at a fine-grained level
and linguistic information at a coarse-grained level. On the ICMC-ASR eval set,
the proposed system achieves a CER of 13.16% on track 1 and a cpCER of 21.48%
on track 2, which significantly outperforms the official baseline system and
obtains the first rank on both tracks. |
---|---|
DOI: | 10.48550/arxiv.2407.02052 |