Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection

It becomes urgent to design effective anti-spoofing algorithms for vulnerable automatic speaker verification systems due to the advancement of high-quality playback devices. Current studies mainly treat anti-spoofing as a binary classification problem between bonafide and spoofed utterances, while l...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2023-01
Hauptverfasser:	Dou, Yongqiang, Yang, Haocheng, Yang, Maolin, Xu, Yanyan, Ke, Dengfeng
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Audio equipment Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning Computer simulation Entropy (Information theory) Spoofing Statistics - Machine Learning System effectiveness Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Dou, Yongqiang Yang, Haocheng Yang, Maolin Xu, Yanyan Ke, Dengfeng
description	It becomes urgent to design effective anti-spoofing algorithms for vulnerable automatic speaker verification systems due to the advancement of high-quality playback devices. Current studies mainly treat anti-spoofing as a binary classification problem between bonafide and spoofed utterances, while lack of indistinguishable samples makes it difficult to train a robust spoofing detector. In this paper, we argue that for anti-spoofing, it needs more attention for indistinguishable samples over easily-classified ones in the modeling process, to make correct discrimination a top priority. Therefore, to mitigate the data discrepancy between training and inference, we propose D3M, to leverage a balanced focal loss function as the training objective to dynamically scale the loss based on the traits of the sample itself. Besides, in the experiments, we select three kinds of features that contain both magnitude-based and phase-based information to form complementary and informative features. Experimental results on the ASVspoof2019 dataset demonstrate the superiority of the proposed methods by comparison between our systems and top-performing ones. Systems trained with the balanced focal loss perform significantly better than conventional cross-entropy loss. With complementary features, our fusion system with only three kinds of features outperforms other systems containing five or more complex single models by 22.5% for min-tDCF and 7% for EER, achieving a min-tDCF and an EER of 0.0124 and 0.55% respectively. Furthermore, we present and discuss the evaluation results on real replay data apart from the simulated ASVspoof2019 data, indicating that research for anti-spoofing still has a long way to go. Source code, analysis data, and other details are publicly available at https://github.com/asvspoof/D3M.
doi_str_mv	10.48550/arxiv.2006.14563
format	Article
fullrecord	<record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2006_14563</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2417703486</sourcerecordid><originalsourceid>FETCH-LOGICAL-a953-e7169c7f14aab23358dac20a51aaa1442151add3a05c2da23ebfa2291e3d55853</originalsourceid><addsrcrecordid>eNotkF9LwzAUxYMgOOY-gE8GfO5MbpL-eZyrU6EiyB6FcpemM7Nra5Kp_fbWzad7L_zO5ZxDyBVnc5kqxW7R_divOTAWz7lUsTgjExCCR6kEuCAz73eMMYgTUEpMyFs-tLi3GptmoM822C0G225pjgFpbr12psdWD_Tbhnd6h814mIquulFBi857WneOvpq-wYEuQkD9QXMTjA62ay_JeY2NN7P_OSXr1f16-RgVLw9Py0URYaZEZBIeZzqpuUTcjFZVWqEGhoojIpcS-LhVlUCmNFQIwmxqBMi4EZVSqRJTcn16e4xe9s7u0Q3lXwXlsYKRuDkRves-D8aHctcdXDt6KkHyJGFCprH4BTNIXrY</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2417703486</pqid></control><display><type>article</type><title>Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Dou, Yongqiang ; Yang, Haocheng ; Yang, Maolin ; Xu, Yanyan ; Ke, Dengfeng</creator><creatorcontrib>Dou, Yongqiang ; Yang, Haocheng ; Yang, Maolin ; Xu, Yanyan ; Ke, Dengfeng</creatorcontrib><description>It becomes urgent to design effective anti-spoofing algorithms for vulnerable automatic speaker verification systems due to the advancement of high-quality playback devices. Current studies mainly treat anti-spoofing as a binary classification problem between bonafide and spoofed utterances, while lack of indistinguishable samples makes it difficult to train a robust spoofing detector. In this paper, we argue that for anti-spoofing, it needs more attention for indistinguishable samples over easily-classified ones in the modeling process, to make correct discrimination a top priority. Therefore, to mitigate the data discrepancy between training and inference, we propose D3M, to leverage a balanced focal loss function as the training objective to dynamically scale the loss based on the traits of the sample itself. Besides, in the experiments, we select three kinds of features that contain both magnitude-based and phase-based information to form complementary and informative features. Experimental results on the ASVspoof2019 dataset demonstrate the superiority of the proposed methods by comparison between our systems and top-performing ones. Systems trained with the balanced focal loss perform significantly better than conventional cross-entropy loss. With complementary features, our fusion system with only three kinds of features outperforms other systems containing five or more complex single models by 22.5% for min-tDCF and 7% for EER, achieving a min-tDCF and an EER of 0.0124 and 0.55% respectively. Furthermore, we present and discuss the evaluation results on real replay data apart from the simulated ASVspoof2019 data, indicating that research for anti-spoofing still has a long way to go. Source code, analysis data, and other details are publicly available at https://github.com/asvspoof/D3M.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2006.14563</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Audio equipment ; Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Learning ; Computer simulation ; Entropy (Information theory) ; Spoofing ; Statistics - Machine Learning ; System effectiveness ; Training</subject><ispartof>arXiv.org, 2023-01</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,784,885,27925</link.rule.ids><backlink>$$Uhttps://doi.org/10.1109/ICPR48806.2021.9412749$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.48550/arXiv.2006.14563$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Dou, Yongqiang</creatorcontrib><creatorcontrib>Yang, Haocheng</creatorcontrib><creatorcontrib>Yang, Maolin</creatorcontrib><creatorcontrib>Xu, Yanyan</creatorcontrib><creatorcontrib>Ke, Dengfeng</creatorcontrib><title>Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection</title><title>arXiv.org</title><description>It becomes urgent to design effective anti-spoofing algorithms for vulnerable automatic speaker verification systems due to the advancement of high-quality playback devices. Current studies mainly treat anti-spoofing as a binary classification problem between bonafide and spoofed utterances, while lack of indistinguishable samples makes it difficult to train a robust spoofing detector. In this paper, we argue that for anti-spoofing, it needs more attention for indistinguishable samples over easily-classified ones in the modeling process, to make correct discrimination a top priority. Therefore, to mitigate the data discrepancy between training and inference, we propose D3M, to leverage a balanced focal loss function as the training objective to dynamically scale the loss based on the traits of the sample itself. Besides, in the experiments, we select three kinds of features that contain both magnitude-based and phase-based information to form complementary and informative features. Experimental results on the ASVspoof2019 dataset demonstrate the superiority of the proposed methods by comparison between our systems and top-performing ones. Systems trained with the balanced focal loss perform significantly better than conventional cross-entropy loss. With complementary features, our fusion system with only three kinds of features outperforms other systems containing five or more complex single models by 22.5% for min-tDCF and 7% for EER, achieving a min-tDCF and an EER of 0.0124 and 0.55% respectively. Furthermore, we present and discuss the evaluation results on real replay data apart from the simulated ASVspoof2019 data, indicating that research for anti-spoofing still has a long way to go. Source code, analysis data, and other details are publicly available at https://github.com/asvspoof/D3M.</description><subject>Algorithms</subject><subject>Audio equipment</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Learning</subject><subject>Computer simulation</subject><subject>Entropy (Information theory)</subject><subject>Spoofing</subject><subject>Statistics - Machine Learning</subject><subject>System effectiveness</subject><subject>Training</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNotkF9LwzAUxYMgOOY-gE8GfO5MbpL-eZyrU6EiyB6FcpemM7Nra5Kp_fbWzad7L_zO5ZxDyBVnc5kqxW7R_divOTAWz7lUsTgjExCCR6kEuCAz73eMMYgTUEpMyFs-tLi3GptmoM822C0G225pjgFpbr12psdWD_Tbhnd6h814mIquulFBi857WneOvpq-wYEuQkD9QXMTjA62ay_JeY2NN7P_OSXr1f16-RgVLw9Py0URYaZEZBIeZzqpuUTcjFZVWqEGhoojIpcS-LhVlUCmNFQIwmxqBMi4EZVSqRJTcn16e4xe9s7u0Q3lXwXlsYKRuDkRves-D8aHctcdXDt6KkHyJGFCprH4BTNIXrY</recordid><startdate>20230118</startdate><enddate>20230118</enddate><creator>Dou, Yongqiang</creator><creator>Yang, Haocheng</creator><creator>Yang, Maolin</creator><creator>Xu, Yanyan</creator><creator>Ke, Dengfeng</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20230118</creationdate><title>Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection</title><author>Dou, Yongqiang ; Yang, Haocheng ; Yang, Maolin ; Xu, Yanyan ; Ke, Dengfeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a953-e7169c7f14aab23358dac20a51aaa1442151add3a05c2da23ebfa2291e3d55853</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Audio equipment</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Learning</topic><topic>Computer simulation</topic><topic>Entropy (Information theory)</topic><topic>Spoofing</topic><topic>Statistics - Machine Learning</topic><topic>System effectiveness</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Dou, Yongqiang</creatorcontrib><creatorcontrib>Yang, Haocheng</creatorcontrib><creatorcontrib>Yang, Maolin</creatorcontrib><creatorcontrib>Xu, Yanyan</creatorcontrib><creatorcontrib>Ke, Dengfeng</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dou, Yongqiang</au><au>Yang, Haocheng</au><au>Yang, Maolin</au><au>Xu, Yanyan</au><au>Ke, Dengfeng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection</atitle><jtitle>arXiv.org</jtitle><date>2023-01-18</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>It becomes urgent to design effective anti-spoofing algorithms for vulnerable automatic speaker verification systems due to the advancement of high-quality playback devices. Current studies mainly treat anti-spoofing as a binary classification problem between bonafide and spoofed utterances, while lack of indistinguishable samples makes it difficult to train a robust spoofing detector. In this paper, we argue that for anti-spoofing, it needs more attention for indistinguishable samples over easily-classified ones in the modeling process, to make correct discrimination a top priority. Therefore, to mitigate the data discrepancy between training and inference, we propose D3M, to leverage a balanced focal loss function as the training objective to dynamically scale the loss based on the traits of the sample itself. Besides, in the experiments, we select three kinds of features that contain both magnitude-based and phase-based information to form complementary and informative features. Experimental results on the ASVspoof2019 dataset demonstrate the superiority of the proposed methods by comparison between our systems and top-performing ones. Systems trained with the balanced focal loss perform significantly better than conventional cross-entropy loss. With complementary features, our fusion system with only three kinds of features outperforms other systems containing five or more complex single models by 22.5% for min-tDCF and 7% for EER, achieving a min-tDCF and an EER of 0.0124 and 0.55% respectively. Furthermore, we present and discuss the evaluation results on real replay data apart from the simulated ASVspoof2019 data, indicating that research for anti-spoofing still has a long way to go. Source code, analysis data, and other details are publicly available at https://github.com/asvspoof/D3M.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2006.14563</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2023-01
issn	2331-8422
language	eng
recordid	cdi_arxiv_primary_2006_14563
source	arXiv.org; Free E- Journals
subjects	Algorithms Audio equipment Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning Computer simulation Entropy (Information theory) Spoofing Statistics - Machine Learning System effectiveness Training
title	Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T17%3A30%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Dynamically%20Mitigating%20Data%20Discrepancy%20with%20Balanced%20Focal%20Loss%20for%20Replay%20Attack%20Detection&rft.jtitle=arXiv.org&rft.au=Dou,%20Yongqiang&rft.date=2023-01-18&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2006.14563&rft_dat=%3Cproquest_arxiv%3E2417703486%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2417703486&rft_id=info:pmid/&rfr_iscdi=true