NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription
We introduce the first Natural Office Talkers in Settings of Far-field Audio Recordings (``NOTSOFAR-1'') Challenge alongside datasets and baseline system. The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in far-field meeting scenarios, with singl...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We introduce the first Natural Office Talkers in Settings of Far-field Audio
Recordings (``NOTSOFAR-1'') Challenge alongside datasets and baseline system.
The challenge focuses on distant speaker diarization and automatic speech
recognition (DASR) in far-field meeting scenarios, with single-channel and
known-geometry multi-channel tracks, and serves as a launch platform for two
new datasets: First, a benchmarking dataset of 315 meetings, averaging 6
minutes each, capturing a broad spectrum of real-world acoustic conditions and
conversational dynamics. It is recorded across 30 conference rooms, featuring
4-8 attendees and a total of 35 unique speakers. Second, a 1000-hour simulated
training dataset, synthesized with enhanced authenticity for real-world
generalization, incorporating 15,000 real acoustic transfer functions. The
tasks focus on single-device DASR, where multi-channel devices always share the
same known geometry. This is aligned with common setups in actual conference
rooms, and avoids technical complexities associated with multi-device tasks. It
also allows for the development of geometry-specific solutions. The NOTSOFAR-1
Challenge aims to advance research in the field of distant conversational
speech recognition, providing key resources to unlock the potential of
data-driven methods, which we believe are currently constrained by the absence
of comprehensive high-quality training and benchmarking datasets. |
---|---|
DOI: | 10.48550/arxiv.2401.08887 |