VoxVietnam: a Large-Scale Multi-Genre Dataset for Vietnamese Speaker Recognition
Recent research in speaker recognition aims to address vulnerabilities due to variations between enrolment and test utterances, particularly in the multi-genre phenomenon where the utterances are in different speech genres. Previous resources for Vietnamese speaker recognition are either limited in...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recent research in speaker recognition aims to address vulnerabilities due to
variations between enrolment and test utterances, particularly in the
multi-genre phenomenon where the utterances are in different speech genres.
Previous resources for Vietnamese speaker recognition are either limited in
size or do not focus on genre diversity, leaving studies in multi-genre effects
unexplored. This paper introduces VoxVietnam, the first multi-genre dataset for
Vietnamese speaker recognition with over 187,000 utterances from 1,406 speakers
and an automated pipeline to construct a dataset on a large scale from public
sources. Our experiments show the challenges posed by the multi-genre
phenomenon to models trained on a single-genre dataset, and demonstrate a
significant increase in performance upon incorporating the VoxVietnam into the
training process. Our experiments are conducted to study the challenges of the
multi-genre phenomenon in speaker recognition and the performance gain when the
proposed dataset is used for multi-genre training. |
---|---|
DOI: | 10.48550/arxiv.2501.00328 |