VoxVietnam: a Large-Scale Multi-Genre Dataset for Vietnamese Speaker Recognition
Recent research in speaker recognition aims to address vulnerabilities due to variations between enrolment and test utterances, particularly in the multi-genre phenomenon where the utterances are in different speech genres. Previous resources for Vietnamese speaker recognition are either limited in...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Vu, Hoang Long Dat, Phuong Tuan Nhi, Pham Thao Hao, Nguyen Song Trang, Nguyen Thi Thu |
description | Recent research in speaker recognition aims to address vulnerabilities due to
variations between enrolment and test utterances, particularly in the
multi-genre phenomenon where the utterances are in different speech genres.
Previous resources for Vietnamese speaker recognition are either limited in
size or do not focus on genre diversity, leaving studies in multi-genre effects
unexplored. This paper introduces VoxVietnam, the first multi-genre dataset for
Vietnamese speaker recognition with over 187,000 utterances from 1,406 speakers
and an automated pipeline to construct a dataset on a large scale from public
sources. Our experiments show the challenges posed by the multi-genre
phenomenon to models trained on a single-genre dataset, and demonstrate a
significant increase in performance upon incorporating the VoxVietnam into the
training process. Our experiments are conducted to study the challenges of the
multi-genre phenomenon in speaker recognition and the performance gain when the
proposed dataset is used for multi-genre training. |
doi_str_mv | 10.48550/arxiv.2501.00328 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2501_00328</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2501_00328</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2501_003283</originalsourceid><addsrcrecordid>eNqFjssKgkAUQGfTIqoPaNX9gbFRE6Rtz0VBZLiVi1xlaJyR6xT295G4b3U2B84RYhmqYJMmiVoj9_odRIkKA6XiKJ2KW-76XJO32GwB4YJck8xKNATXl_FansgywR49duShcgyjTh1B1hI-ieFOpaut9trZuZhUaDpajJyJ1fHw2J3l0C5a1g3yp_g9FMND_N_4AqKGPAI</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>VoxVietnam: a Large-Scale Multi-Genre Dataset for Vietnamese Speaker Recognition</title><source>arXiv.org</source><creator>Vu, Hoang Long ; Dat, Phuong Tuan ; Nhi, Pham Thao ; Hao, Nguyen Song ; Trang, Nguyen Thi Thu</creator><creatorcontrib>Vu, Hoang Long ; Dat, Phuong Tuan ; Nhi, Pham Thao ; Hao, Nguyen Song ; Trang, Nguyen Thi Thu</creatorcontrib><description>Recent research in speaker recognition aims to address vulnerabilities due to
variations between enrolment and test utterances, particularly in the
multi-genre phenomenon where the utterances are in different speech genres.
Previous resources for Vietnamese speaker recognition are either limited in
size or do not focus on genre diversity, leaving studies in multi-genre effects
unexplored. This paper introduces VoxVietnam, the first multi-genre dataset for
Vietnamese speaker recognition with over 187,000 utterances from 1,406 speakers
and an automated pipeline to construct a dataset on a large scale from public
sources. Our experiments show the challenges posed by the multi-genre
phenomenon to models trained on a single-genre dataset, and demonstrate a
significant increase in performance upon incorporating the VoxVietnam into the
training process. Our experiments are conducted to study the challenges of the
multi-genre phenomenon in speaker recognition and the performance gain when the
proposed dataset is used for multi-genre training.</description><identifier>DOI: 10.48550/arxiv.2501.00328</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Sound</subject><creationdate>2024-12</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2501.00328$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2501.00328$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Vu, Hoang Long</creatorcontrib><creatorcontrib>Dat, Phuong Tuan</creatorcontrib><creatorcontrib>Nhi, Pham Thao</creatorcontrib><creatorcontrib>Hao, Nguyen Song</creatorcontrib><creatorcontrib>Trang, Nguyen Thi Thu</creatorcontrib><title>VoxVietnam: a Large-Scale Multi-Genre Dataset for Vietnamese Speaker Recognition</title><description>Recent research in speaker recognition aims to address vulnerabilities due to
variations between enrolment and test utterances, particularly in the
multi-genre phenomenon where the utterances are in different speech genres.
Previous resources for Vietnamese speaker recognition are either limited in
size or do not focus on genre diversity, leaving studies in multi-genre effects
unexplored. This paper introduces VoxVietnam, the first multi-genre dataset for
Vietnamese speaker recognition with over 187,000 utterances from 1,406 speakers
and an automated pipeline to construct a dataset on a large scale from public
sources. Our experiments show the challenges posed by the multi-genre
phenomenon to models trained on a single-genre dataset, and demonstrate a
significant increase in performance upon incorporating the VoxVietnam into the
training process. Our experiments are conducted to study the challenges of the
multi-genre phenomenon in speaker recognition and the performance gain when the
proposed dataset is used for multi-genre training.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjssKgkAUQGfTIqoPaNX9gbFRE6Rtz0VBZLiVi1xlaJyR6xT295G4b3U2B84RYhmqYJMmiVoj9_odRIkKA6XiKJ2KW-76XJO32GwB4YJck8xKNATXl_FansgywR49duShcgyjTh1B1hI-ieFOpaut9trZuZhUaDpajJyJ1fHw2J3l0C5a1g3yp_g9FMND_N_4AqKGPAI</recordid><startdate>20241231</startdate><enddate>20241231</enddate><creator>Vu, Hoang Long</creator><creator>Dat, Phuong Tuan</creator><creator>Nhi, Pham Thao</creator><creator>Hao, Nguyen Song</creator><creator>Trang, Nguyen Thi Thu</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241231</creationdate><title>VoxVietnam: a Large-Scale Multi-Genre Dataset for Vietnamese Speaker Recognition</title><author>Vu, Hoang Long ; Dat, Phuong Tuan ; Nhi, Pham Thao ; Hao, Nguyen Song ; Trang, Nguyen Thi Thu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2501_003283</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Vu, Hoang Long</creatorcontrib><creatorcontrib>Dat, Phuong Tuan</creatorcontrib><creatorcontrib>Nhi, Pham Thao</creatorcontrib><creatorcontrib>Hao, Nguyen Song</creatorcontrib><creatorcontrib>Trang, Nguyen Thi Thu</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Vu, Hoang Long</au><au>Dat, Phuong Tuan</au><au>Nhi, Pham Thao</au><au>Hao, Nguyen Song</au><au>Trang, Nguyen Thi Thu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>VoxVietnam: a Large-Scale Multi-Genre Dataset for Vietnamese Speaker Recognition</atitle><date>2024-12-31</date><risdate>2024</risdate><abstract>Recent research in speaker recognition aims to address vulnerabilities due to
variations between enrolment and test utterances, particularly in the
multi-genre phenomenon where the utterances are in different speech genres.
Previous resources for Vietnamese speaker recognition are either limited in
size or do not focus on genre diversity, leaving studies in multi-genre effects
unexplored. This paper introduces VoxVietnam, the first multi-genre dataset for
Vietnamese speaker recognition with over 187,000 utterances from 1,406 speakers
and an automated pipeline to construct a dataset on a large scale from public
sources. Our experiments show the challenges posed by the multi-genre
phenomenon to models trained on a single-genre dataset, and demonstrate a
significant increase in performance upon incorporating the VoxVietnam into the
training process. Our experiments are conducted to study the challenges of the
multi-genre phenomenon in speaker recognition and the performance gain when the
proposed dataset is used for multi-genre training.</abstract><doi>10.48550/arxiv.2501.00328</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2501.00328 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2501_00328 |
source | arXiv.org |
subjects | Computer Science - Computation and Language Computer Science - Sound |
title | VoxVietnam: a Large-Scale Multi-Genre Dataset for Vietnamese Speaker Recognition |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T12%3A32%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=VoxVietnam:%20a%20Large-Scale%20Multi-Genre%20Dataset%20for%20Vietnamese%20Speaker%20Recognition&rft.au=Vu,%20Hoang%20Long&rft.date=2024-12-31&rft_id=info:doi/10.48550/arxiv.2501.00328&rft_dat=%3Carxiv_GOX%3E2501_00328%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |