Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection

It becomes urgent to design effective anti-spoofing algorithms for vulnerable automatic speaker verification systems due to the advancement of high-quality playback devices. Current studies mainly treat anti-spoofing as a binary classification problem between bonafide and spoofed utterances, while l...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2023-01
Hauptverfasser: Dou, Yongqiang, Yang, Haocheng, Yang, Maolin, Xu, Yanyan, Ke, Dengfeng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Dou, Yongqiang
Yang, Haocheng
Yang, Maolin
Xu, Yanyan
Ke, Dengfeng
description It becomes urgent to design effective anti-spoofing algorithms for vulnerable automatic speaker verification systems due to the advancement of high-quality playback devices. Current studies mainly treat anti-spoofing as a binary classification problem between bonafide and spoofed utterances, while lack of indistinguishable samples makes it difficult to train a robust spoofing detector. In this paper, we argue that for anti-spoofing, it needs more attention for indistinguishable samples over easily-classified ones in the modeling process, to make correct discrimination a top priority. Therefore, to mitigate the data discrepancy between training and inference, we propose D3M, to leverage a balanced focal loss function as the training objective to dynamically scale the loss based on the traits of the sample itself. Besides, in the experiments, we select three kinds of features that contain both magnitude-based and phase-based information to form complementary and informative features. Experimental results on the ASVspoof2019 dataset demonstrate the superiority of the proposed methods by comparison between our systems and top-performing ones. Systems trained with the balanced focal loss perform significantly better than conventional cross-entropy loss. With complementary features, our fusion system with only three kinds of features outperforms other systems containing five or more complex single models by 22.5% for min-tDCF and 7% for EER, achieving a min-tDCF and an EER of 0.0124 and 0.55% respectively. Furthermore, we present and discuss the evaluation results on real replay data apart from the simulated ASVspoof2019 data, indicating that research for anti-spoofing still has a long way to go. Source code, analysis data, and other details are publicly available at https://github.com/asvspoof/D3M.
doi_str_mv 10.48550/arxiv.2006.14563
format Article
fullrecord <record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2006_14563</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2417703486</sourcerecordid><originalsourceid>FETCH-LOGICAL-a953-e7169c7f14aab23358dac20a51aaa1442151add3a05c2da23ebfa2291e3d55853</originalsourceid><addsrcrecordid>eNotkF9LwzAUxYMgOOY-gE8GfO5MbpL-eZyrU6EiyB6FcpemM7Nra5Kp_fbWzad7L_zO5ZxDyBVnc5kqxW7R_divOTAWz7lUsTgjExCCR6kEuCAz73eMMYgTUEpMyFs-tLi3GptmoM822C0G225pjgFpbr12psdWD_Tbhnd6h814mIquulFBi857WneOvpq-wYEuQkD9QXMTjA62ay_JeY2NN7P_OSXr1f16-RgVLw9Py0URYaZEZBIeZzqpuUTcjFZVWqEGhoojIpcS-LhVlUCmNFQIwmxqBMi4EZVSqRJTcn16e4xe9s7u0Q3lXwXlsYKRuDkRves-D8aHctcdXDt6KkHyJGFCprH4BTNIXrY</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2417703486</pqid></control><display><type>article</type><title>Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Dou, Yongqiang ; Yang, Haocheng ; Yang, Maolin ; Xu, Yanyan ; Ke, Dengfeng</creator><creatorcontrib>Dou, Yongqiang ; Yang, Haocheng ; Yang, Maolin ; Xu, Yanyan ; Ke, Dengfeng</creatorcontrib><description>It becomes urgent to design effective anti-spoofing algorithms for vulnerable automatic speaker verification systems due to the advancement of high-quality playback devices. Current studies mainly treat anti-spoofing as a binary classification problem between bonafide and spoofed utterances, while lack of indistinguishable samples makes it difficult to train a robust spoofing detector. In this paper, we argue that for anti-spoofing, it needs more attention for indistinguishable samples over easily-classified ones in the modeling process, to make correct discrimination a top priority. Therefore, to mitigate the data discrepancy between training and inference, we propose D3M, to leverage a balanced focal loss function as the training objective to dynamically scale the loss based on the traits of the sample itself. Besides, in the experiments, we select three kinds of features that contain both magnitude-based and phase-based information to form complementary and informative features. Experimental results on the ASVspoof2019 dataset demonstrate the superiority of the proposed methods by comparison between our systems and top-performing ones. Systems trained with the balanced focal loss perform significantly better than conventional cross-entropy loss. With complementary features, our fusion system with only three kinds of features outperforms other systems containing five or more complex single models by 22.5% for min-tDCF and 7% for EER, achieving a min-tDCF and an EER of 0.0124 and 0.55% respectively. Furthermore, we present and discuss the evaluation results on real replay data apart from the simulated ASVspoof2019 data, indicating that research for anti-spoofing still has a long way to go. Source code, analysis data, and other details are publicly available at https://github.com/asvspoof/D3M.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2006.14563</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Audio equipment ; Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Learning ; Computer simulation ; Entropy (Information theory) ; Spoofing ; Statistics - Machine Learning ; System effectiveness ; Training</subject><ispartof>arXiv.org, 2023-01</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,784,885,27925</link.rule.ids><backlink>$$Uhttps://doi.org/10.1109/ICPR48806.2021.9412749$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.48550/arXiv.2006.14563$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Dou, Yongqiang</creatorcontrib><creatorcontrib>Yang, Haocheng</creatorcontrib><creatorcontrib>Yang, Maolin</creatorcontrib><creatorcontrib>Xu, Yanyan</creatorcontrib><creatorcontrib>Ke, Dengfeng</creatorcontrib><title>Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection</title><title>arXiv.org</title><description>It becomes urgent to design effective anti-spoofing algorithms for vulnerable automatic speaker verification systems due to the advancement of high-quality playback devices. Current studies mainly treat anti-spoofing as a binary classification problem between bonafide and spoofed utterances, while lack of indistinguishable samples makes it difficult to train a robust spoofing detector. In this paper, we argue that for anti-spoofing, it needs more attention for indistinguishable samples over easily-classified ones in the modeling process, to make correct discrimination a top priority. Therefore, to mitigate the data discrepancy between training and inference, we propose D3M, to leverage a balanced focal loss function as the training objective to dynamically scale the loss based on the traits of the sample itself. Besides, in the experiments, we select three kinds of features that contain both magnitude-based and phase-based information to form complementary and informative features. Experimental results on the ASVspoof2019 dataset demonstrate the superiority of the proposed methods by comparison between our systems and top-performing ones. Systems trained with the balanced focal loss perform significantly better than conventional cross-entropy loss. With complementary features, our fusion system with only three kinds of features outperforms other systems containing five or more complex single models by 22.5% for min-tDCF and 7% for EER, achieving a min-tDCF and an EER of 0.0124 and 0.55% respectively. Furthermore, we present and discuss the evaluation results on real replay data apart from the simulated ASVspoof2019 data, indicating that research for anti-spoofing still has a long way to go. Source code, analysis data, and other details are publicly available at https://github.com/asvspoof/D3M.</description><subject>Algorithms</subject><subject>Audio equipment</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Learning</subject><subject>Computer simulation</subject><subject>Entropy (Information theory)</subject><subject>Spoofing</subject><subject>Statistics - Machine Learning</subject><subject>System effectiveness</subject><subject>Training</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNotkF9LwzAUxYMgOOY-gE8GfO5MbpL-eZyrU6EiyB6FcpemM7Nra5Kp_fbWzad7L_zO5ZxDyBVnc5kqxW7R_divOTAWz7lUsTgjExCCR6kEuCAz73eMMYgTUEpMyFs-tLi3GptmoM822C0G225pjgFpbr12psdWD_Tbhnd6h814mIquulFBi857WneOvpq-wYEuQkD9QXMTjA62ay_JeY2NN7P_OSXr1f16-RgVLw9Py0URYaZEZBIeZzqpuUTcjFZVWqEGhoojIpcS-LhVlUCmNFQIwmxqBMi4EZVSqRJTcn16e4xe9s7u0Q3lXwXlsYKRuDkRves-D8aHctcdXDt6KkHyJGFCprH4BTNIXrY</recordid><startdate>20230118</startdate><enddate>20230118</enddate><creator>Dou, Yongqiang</creator><creator>Yang, Haocheng</creator><creator>Yang, Maolin</creator><creator>Xu, Yanyan</creator><creator>Ke, Dengfeng</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20230118</creationdate><title>Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection</title><author>Dou, Yongqiang ; Yang, Haocheng ; Yang, Maolin ; Xu, Yanyan ; Ke, Dengfeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a953-e7169c7f14aab23358dac20a51aaa1442151add3a05c2da23ebfa2291e3d55853</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Audio equipment</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Learning</topic><topic>Computer simulation</topic><topic>Entropy (Information theory)</topic><topic>Spoofing</topic><topic>Statistics - Machine Learning</topic><topic>System effectiveness</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Dou, Yongqiang</creatorcontrib><creatorcontrib>Yang, Haocheng</creatorcontrib><creatorcontrib>Yang, Maolin</creatorcontrib><creatorcontrib>Xu, Yanyan</creatorcontrib><creatorcontrib>Ke, Dengfeng</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dou, Yongqiang</au><au>Yang, Haocheng</au><au>Yang, Maolin</au><au>Xu, Yanyan</au><au>Ke, Dengfeng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection</atitle><jtitle>arXiv.org</jtitle><date>2023-01-18</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>It becomes urgent to design effective anti-spoofing algorithms for vulnerable automatic speaker verification systems due to the advancement of high-quality playback devices. Current studies mainly treat anti-spoofing as a binary classification problem between bonafide and spoofed utterances, while lack of indistinguishable samples makes it difficult to train a robust spoofing detector. In this paper, we argue that for anti-spoofing, it needs more attention for indistinguishable samples over easily-classified ones in the modeling process, to make correct discrimination a top priority. Therefore, to mitigate the data discrepancy between training and inference, we propose D3M, to leverage a balanced focal loss function as the training objective to dynamically scale the loss based on the traits of the sample itself. Besides, in the experiments, we select three kinds of features that contain both magnitude-based and phase-based information to form complementary and informative features. Experimental results on the ASVspoof2019 dataset demonstrate the superiority of the proposed methods by comparison between our systems and top-performing ones. Systems trained with the balanced focal loss perform significantly better than conventional cross-entropy loss. With complementary features, our fusion system with only three kinds of features outperforms other systems containing five or more complex single models by 22.5% for min-tDCF and 7% for EER, achieving a min-tDCF and an EER of 0.0124 and 0.55% respectively. Furthermore, we present and discuss the evaluation results on real replay data apart from the simulated ASVspoof2019 data, indicating that research for anti-spoofing still has a long way to go. Source code, analysis data, and other details are publicly available at https://github.com/asvspoof/D3M.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2006.14563</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2023-01
issn 2331-8422
language eng
recordid cdi_arxiv_primary_2006_14563
source arXiv.org; Free E- Journals
subjects Algorithms
Audio equipment
Computer Science - Computer Vision and Pattern Recognition
Computer Science - Learning
Computer simulation
Entropy (Information theory)
Spoofing
Statistics - Machine Learning
System effectiveness
Training
title Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T17%3A30%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Dynamically%20Mitigating%20Data%20Discrepancy%20with%20Balanced%20Focal%20Loss%20for%20Replay%20Attack%20Detection&rft.jtitle=arXiv.org&rft.au=Dou,%20Yongqiang&rft.date=2023-01-18&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2006.14563&rft_dat=%3Cproquest_arxiv%3E2417703486%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2417703486&rft_id=info:pmid/&rfr_iscdi=true