The DKU Replay Detection System for the ASVspoof 2019 Challenge: On Data Augmentation, Feature Representation, Classification, and Fusion

This paper describes our DKU replay detection system for the ASVspoof 2019 challenge. The goal is to develop spoofing countermeasure for automatic speaker recognition in physical access scenario. We leverage the countermeasure system pipeline from four aspects, including the data augmentation, featu...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Cai, Weicheng, Wu, Haiwei, Cai, Danwei, Li, Ming
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Cryptography and Security Computer Science - Learning Computer Science - Multimedia Computer Science - Sound
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Cai, Weicheng Wu, Haiwei Cai, Danwei Li, Ming
description	This paper describes our DKU replay detection system for the ASVspoof 2019 challenge. The goal is to develop spoofing countermeasure for automatic speaker recognition in physical access scenario. We leverage the countermeasure system pipeline from four aspects, including the data augmentation, feature representation, classification, and fusion. First, we introduce an utterance-level deep learning framework for anti-spoofing. It receives the variable-length feature sequence and outputs the utterance-level scores directly. Based on the framework, we try out various kinds of input feature representations extracted from either the magnitude spectrum or phase spectrum. Besides, we also perform the data augmentation strategy by applying the speed perturbation on the raw waveform. Our best single system employs a residual neural network trained by the speed-perturbed group delay gram. It achieves EER of 1.04% on the development set, as well as EER of 1.08% on the evaluation set. Finally, using the simple average score from several single systems can further improve the performance. EER of 0.24% on the development set and 0.66% on the evaluation set is obtained for our primary system.
doi_str_mv	10.48550/arxiv.1907.02663
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1907_02663</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1907_02663</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-8142e0459d5b6d96af0ed4f970fc3267ed884be0a56cf1579d5f3b15b72d444c3</originalsourceid><addsrcrecordid>eNpFkM1Og0AUhdm4MNUHcOV9AFsHmB9w11DRxiZNLLolF-ZOS0KhYQYjj-BbC7WJq3PPybln8Xnenc8WPBKCPWL3XX0t_JipBQukDK-9n-xAsHr7gHc61TjAihyVrmob2A3W0RFM24EbO8vdpz21rYGA-TEkB6xravb0BNsGVugQlv3-SI3D6fkBUkLXdzTNdmT_86RGaytTlRePjYa0t-N9410ZrC3dXnTmZelzlrzON9uXdbLczFGqcB75PCDGRaxFIXUs0TDS3MSKmTIMpCIdRbwghkKWxhdq7Jmw8EWhAs05L8OZd_83e2aRn7rqiN2QT0zyM5PwFxmQXJg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>The DKU Replay Detection System for the ASVspoof 2019 Challenge: On Data Augmentation, Feature Representation, Classification, and Fusion</title><source>arXiv.org</source><creator>Cai, Weicheng ; Wu, Haiwei ; Cai, Danwei ; Li, Ming</creator><creatorcontrib>Cai, Weicheng ; Wu, Haiwei ; Cai, Danwei ; Li, Ming</creatorcontrib><description>This paper describes our DKU replay detection system for the ASVspoof 2019 challenge. The goal is to develop spoofing countermeasure for automatic speaker recognition in physical access scenario. We leverage the countermeasure system pipeline from four aspects, including the data augmentation, feature representation, classification, and fusion. First, we introduce an utterance-level deep learning framework for anti-spoofing. It receives the variable-length feature sequence and outputs the utterance-level scores directly. Based on the framework, we try out various kinds of input feature representations extracted from either the magnitude spectrum or phase spectrum. Besides, we also perform the data augmentation strategy by applying the speed perturbation on the raw waveform. Our best single system employs a residual neural network trained by the speed-perturbed group delay gram. It achieves EER of 1.04% on the development set, as well as EER of 1.08% on the evaluation set. Finally, using the simple average score from several single systems can further improve the performance. EER of 0.24% on the development set and 0.66% on the evaluation set is obtained for our primary system.</description><identifier>DOI: 10.48550/arxiv.1907.02663</identifier><language>eng</language><subject>Computer Science - Cryptography and Security ; Computer Science - Learning ; Computer Science - Multimedia ; Computer Science - Sound</subject><creationdate>2019-07</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1907.02663$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1907.02663$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Cai, Weicheng</creatorcontrib><creatorcontrib>Wu, Haiwei</creatorcontrib><creatorcontrib>Cai, Danwei</creatorcontrib><creatorcontrib>Li, Ming</creatorcontrib><title>The DKU Replay Detection System for the ASVspoof 2019 Challenge: On Data Augmentation, Feature Representation, Classification, and Fusion</title><description>This paper describes our DKU replay detection system for the ASVspoof 2019 challenge. The goal is to develop spoofing countermeasure for automatic speaker recognition in physical access scenario. We leverage the countermeasure system pipeline from four aspects, including the data augmentation, feature representation, classification, and fusion. First, we introduce an utterance-level deep learning framework for anti-spoofing. It receives the variable-length feature sequence and outputs the utterance-level scores directly. Based on the framework, we try out various kinds of input feature representations extracted from either the magnitude spectrum or phase spectrum. Besides, we also perform the data augmentation strategy by applying the speed perturbation on the raw waveform. Our best single system employs a residual neural network trained by the speed-perturbed group delay gram. It achieves EER of 1.04% on the development set, as well as EER of 1.08% on the evaluation set. Finally, using the simple average score from several single systems can further improve the performance. EER of 0.24% on the development set and 0.66% on the evaluation set is obtained for our primary system.</description><subject>Computer Science - Cryptography and Security</subject><subject>Computer Science - Learning</subject><subject>Computer Science - Multimedia</subject><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpFkM1Og0AUhdm4MNUHcOV9AFsHmB9w11DRxiZNLLolF-ZOS0KhYQYjj-BbC7WJq3PPybln8Xnenc8WPBKCPWL3XX0t_JipBQukDK-9n-xAsHr7gHc61TjAihyVrmob2A3W0RFM24EbO8vdpz21rYGA-TEkB6xravb0BNsGVugQlv3-SI3D6fkBUkLXdzTNdmT_86RGaytTlRePjYa0t-N9410ZrC3dXnTmZelzlrzON9uXdbLczFGqcB75PCDGRaxFIXUs0TDS3MSKmTIMpCIdRbwghkKWxhdq7Jmw8EWhAs05L8OZd_83e2aRn7rqiN2QT0zyM5PwFxmQXJg</recordid><startdate>20190704</startdate><enddate>20190704</enddate><creator>Cai, Weicheng</creator><creator>Wu, Haiwei</creator><creator>Cai, Danwei</creator><creator>Li, Ming</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20190704</creationdate><title>The DKU Replay Detection System for the ASVspoof 2019 Challenge: On Data Augmentation, Feature Representation, Classification, and Fusion</title><author>Cai, Weicheng ; Wu, Haiwei ; Cai, Danwei ; Li, Ming</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-8142e0459d5b6d96af0ed4f970fc3267ed884be0a56cf1579d5f3b15b72d444c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Science - Cryptography and Security</topic><topic>Computer Science - Learning</topic><topic>Computer Science - Multimedia</topic><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Cai, Weicheng</creatorcontrib><creatorcontrib>Wu, Haiwei</creatorcontrib><creatorcontrib>Cai, Danwei</creatorcontrib><creatorcontrib>Li, Ming</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Cai, Weicheng</au><au>Wu, Haiwei</au><au>Cai, Danwei</au><au>Li, Ming</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The DKU Replay Detection System for the ASVspoof 2019 Challenge: On Data Augmentation, Feature Representation, Classification, and Fusion</atitle><date>2019-07-04</date><risdate>2019</risdate><abstract>This paper describes our DKU replay detection system for the ASVspoof 2019 challenge. The goal is to develop spoofing countermeasure for automatic speaker recognition in physical access scenario. We leverage the countermeasure system pipeline from four aspects, including the data augmentation, feature representation, classification, and fusion. First, we introduce an utterance-level deep learning framework for anti-spoofing. It receives the variable-length feature sequence and outputs the utterance-level scores directly. Based on the framework, we try out various kinds of input feature representations extracted from either the magnitude spectrum or phase spectrum. Besides, we also perform the data augmentation strategy by applying the speed perturbation on the raw waveform. Our best single system employs a residual neural network trained by the speed-perturbed group delay gram. It achieves EER of 1.04% on the development set, as well as EER of 1.08% on the evaluation set. Finally, using the simple average score from several single systems can further improve the performance. EER of 0.24% on the development set and 0.66% on the evaluation set is obtained for our primary system.</abstract><doi>10.48550/arxiv.1907.02663</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1907.02663
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1907_02663
source	arXiv.org
subjects	Computer Science - Cryptography and Security Computer Science - Learning Computer Science - Multimedia Computer Science - Sound
title	The DKU Replay Detection System for the ASVspoof 2019 Challenge: On Data Augmentation, Feature Representation, Classification, and Fusion
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T04%3A42%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20DKU%20Replay%20Detection%20System%20for%20the%20ASVspoof%202019%20Challenge:%20On%20Data%20Augmentation,%20Feature%20Representation,%20Classification,%20and%20Fusion&rft.au=Cai,%20Weicheng&rft.date=2019-07-04&rft_id=info:doi/10.48550/arxiv.1907.02663&rft_dat=%3Carxiv_GOX%3E1907_02663%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true