The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge
This technical report details our submission system to the CHiME-7 DASR Challenge, which focuses on speaker diarization and speech recognition under complex multi-speaker scenarios. Additionally, it also evaluates the efficiency of systems in handling diverse array devices. To address these issues,...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This technical report details our submission system to the CHiME-7 DASR
Challenge, which focuses on speaker diarization and speech recognition under
complex multi-speaker scenarios. Additionally, it also evaluates the efficiency
of systems in handling diverse array devices. To address these issues, we
implemented an end-to-end speaker diarization system and introduced a
rectification strategy based on multi-channel spatial information. This
approach significantly diminished the word error rates (WER). In terms of
recognition, we utilized publicly available pre-trained models as the
foundational models to train our end-to-end speech recognition models. Our
system attained a Macro-averaged diarization-attributed WER (DA-WER) of 21.01%
on the CHiME-7 evaluation set, which signifies a relative improvement of 62.04%
over the official baseline system. |
---|---|
DOI: | 10.48550/arxiv.2308.14638 |