Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection
It becomes urgent to design effective anti-spoofing algorithms for vulnerable automatic speaker verification systems due to the advancement of high-quality playback devices. Current studies mainly treat anti-spoofing as a binary classification problem between bonafide and spoofed utterances, while l...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2023-01 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Dou, Yongqiang Yang, Haocheng Yang, Maolin Xu, Yanyan Ke, Dengfeng |
description | It becomes urgent to design effective anti-spoofing algorithms for vulnerable automatic speaker verification systems due to the advancement of high-quality playback devices. Current studies mainly treat anti-spoofing as a binary classification problem between bonafide and spoofed utterances, while lack of indistinguishable samples makes it difficult to train a robust spoofing detector. In this paper, we argue that for anti-spoofing, it needs more attention for indistinguishable samples over easily-classified ones in the modeling process, to make correct discrimination a top priority. Therefore, to mitigate the data discrepancy between training and inference, we propose D3M, to leverage a balanced focal loss function as the training objective to dynamically scale the loss based on the traits of the sample itself. Besides, in the experiments, we select three kinds of features that contain both magnitude-based and phase-based information to form complementary and informative features. Experimental results on the ASVspoof2019 dataset demonstrate the superiority of the proposed methods by comparison between our systems and top-performing ones. Systems trained with the balanced focal loss perform significantly better than conventional cross-entropy loss. With complementary features, our fusion system with only three kinds of features outperforms other systems containing five or more complex single models by 22.5% for min-tDCF and 7% for EER, achieving a min-tDCF and an EER of 0.0124 and 0.55% respectively. Furthermore, we present and discuss the evaluation results on real replay data apart from the simulated ASVspoof2019 data, indicating that research for anti-spoofing still has a long way to go. Source code, analysis data, and other details are publicly available at https://github.com/asvspoof/D3M. |
doi_str_mv | 10.48550/arxiv.2006.14563 |
format | Article |
fullrecord | <record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2006_14563</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2417703486</sourcerecordid><originalsourceid>FETCH-LOGICAL-a953-e7169c7f14aab23358dac20a51aaa1442151add3a05c2da23ebfa2291e3d55853</originalsourceid><addsrcrecordid>eNotkF9LwzAUxYMgOOY-gE8GfO5MbpL-eZyrU6EiyB6FcpemM7Nra5Kp_fbWzad7L_zO5ZxDyBVnc5kqxW7R_divOTAWz7lUsTgjExCCR6kEuCAz73eMMYgTUEpMyFs-tLi3GptmoM822C0G225pjgFpbr12psdWD_Tbhnd6h814mIquulFBi857WneOvpq-wYEuQkD9QXMTjA62ay_JeY2NN7P_OSXr1f16-RgVLw9Py0URYaZEZBIeZzqpuUTcjFZVWqEGhoojIpcS-LhVlUCmNFQIwmxqBMi4EZVSqRJTcn16e4xe9s7u0Q3lXwXlsYKRuDkRves-D8aHctcdXDt6KkHyJGFCprH4BTNIXrY</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2417703486</pqid></control><display><type>article</type><title>Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Dou, Yongqiang ; Yang, Haocheng ; Yang, Maolin ; Xu, Yanyan ; Ke, Dengfeng</creator><creatorcontrib>Dou, Yongqiang ; Yang, Haocheng ; Yang, Maolin ; Xu, Yanyan ; Ke, Dengfeng</creatorcontrib><description>It becomes urgent to design effective anti-spoofing algorithms for vulnerable automatic speaker verification systems due to the advancement of high-quality playback devices. Current studies mainly treat anti-spoofing as a binary classification problem between bonafide and spoofed utterances, while lack of indistinguishable samples makes it difficult to train a robust spoofing detector. In this paper, we argue that for anti-spoofing, it needs more attention for indistinguishable samples over easily-classified ones in the modeling process, to make correct discrimination a top priority. Therefore, to mitigate the data discrepancy between training and inference, we propose D3M, to leverage a balanced focal loss function as the training objective to dynamically scale the loss based on the traits of the sample itself. Besides, in the experiments, we select three kinds of features that contain both magnitude-based and phase-based information to form complementary and informative features. Experimental results on the ASVspoof2019 dataset demonstrate the superiority of the proposed methods by comparison between our systems and top-performing ones. Systems trained with the balanced focal loss perform significantly better than conventional cross-entropy loss. With complementary features, our fusion system with only three kinds of features outperforms other systems containing five or more complex single models by 22.5% for min-tDCF and 7% for EER, achieving a min-tDCF and an EER of 0.0124 and 0.55% respectively. Furthermore, we present and discuss the evaluation results on real replay data apart from the simulated ASVspoof2019 data, indicating that research for anti-spoofing still has a long way to go. Source code, analysis data, and other details are publicly available at https://github.com/asvspoof/D3M.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2006.14563</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Audio equipment ; Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Learning ; Computer simulation ; Entropy (Information theory) ; Spoofing ; Statistics - Machine Learning ; System effectiveness ; Training</subject><ispartof>arXiv.org, 2023-01</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,784,885,27925</link.rule.ids><backlink>$$Uhttps://doi.org/10.1109/ICPR48806.2021.9412749$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.48550/arXiv.2006.14563$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Dou, Yongqiang</creatorcontrib><creatorcontrib>Yang, Haocheng</creatorcontrib><creatorcontrib>Yang, Maolin</creatorcontrib><creatorcontrib>Xu, Yanyan</creatorcontrib><creatorcontrib>Ke, Dengfeng</creatorcontrib><title>Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection</title><title>arXiv.org</title><description>It becomes urgent to design effective anti-spoofing algorithms for vulnerable automatic speaker verification systems due to the advancement of high-quality playback devices. Current studies mainly treat anti-spoofing as a binary classification problem between bonafide and spoofed utterances, while lack of indistinguishable samples makes it difficult to train a robust spoofing detector. In this paper, we argue that for anti-spoofing, it needs more attention for indistinguishable samples over easily-classified ones in the modeling process, to make correct discrimination a top priority. Therefore, to mitigate the data discrepancy between training and inference, we propose D3M, to leverage a balanced focal loss function as the training objective to dynamically scale the loss based on the traits of the sample itself. Besides, in the experiments, we select three kinds of features that contain both magnitude-based and phase-based information to form complementary and informative features. Experimental results on the ASVspoof2019 dataset demonstrate the superiority of the proposed methods by comparison between our systems and top-performing ones. Systems trained with the balanced focal loss perform significantly better than conventional cross-entropy loss. With complementary features, our fusion system with only three kinds of features outperforms other systems containing five or more complex single models by 22.5% for min-tDCF and 7% for EER, achieving a min-tDCF and an EER of 0.0124 and 0.55% respectively. Furthermore, we present and discuss the evaluation results on real replay data apart from the simulated ASVspoof2019 data, indicating that research for anti-spoofing still has a long way to go. Source code, analysis data, and other details are publicly available at https://github.com/asvspoof/D3M.</description><subject>Algorithms</subject><subject>Audio equipment</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Learning</subject><subject>Computer simulation</subject><subject>Entropy (Information theory)</subject><subject>Spoofing</subject><subject>Statistics - Machine Learning</subject><subject>System effectiveness</subject><subject>Training</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNotkF9LwzAUxYMgOOY-gE8GfO5MbpL-eZyrU6EiyB6FcpemM7Nra5Kp_fbWzad7L_zO5ZxDyBVnc5kqxW7R_divOTAWz7lUsTgjExCCR6kEuCAz73eMMYgTUEpMyFs-tLi3GptmoM822C0G225pjgFpbr12psdWD_Tbhnd6h814mIquulFBi857WneOvpq-wYEuQkD9QXMTjA62ay_JeY2NN7P_OSXr1f16-RgVLw9Py0URYaZEZBIeZzqpuUTcjFZVWqEGhoojIpcS-LhVlUCmNFQIwmxqBMi4EZVSqRJTcn16e4xe9s7u0Q3lXwXlsYKRuDkRves-D8aHctcdXDt6KkHyJGFCprH4BTNIXrY</recordid><startdate>20230118</startdate><enddate>20230118</enddate><creator>Dou, Yongqiang</creator><creator>Yang, Haocheng</creator><creator>Yang, Maolin</creator><creator>Xu, Yanyan</creator><creator>Ke, Dengfeng</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20230118</creationdate><title>Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection</title><author>Dou, Yongqiang ; Yang, Haocheng ; Yang, Maolin ; Xu, Yanyan ; Ke, Dengfeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a953-e7169c7f14aab23358dac20a51aaa1442151add3a05c2da23ebfa2291e3d55853</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Audio equipment</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Learning</topic><topic>Computer simulation</topic><topic>Entropy (Information theory)</topic><topic>Spoofing</topic><topic>Statistics - Machine Learning</topic><topic>System effectiveness</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Dou, Yongqiang</creatorcontrib><creatorcontrib>Yang, Haocheng</creatorcontrib><creatorcontrib>Yang, Maolin</creatorcontrib><creatorcontrib>Xu, Yanyan</creatorcontrib><creatorcontrib>Ke, Dengfeng</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dou, Yongqiang</au><au>Yang, Haocheng</au><au>Yang, Maolin</au><au>Xu, Yanyan</au><au>Ke, Dengfeng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection</atitle><jtitle>arXiv.org</jtitle><date>2023-01-18</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>It becomes urgent to design effective anti-spoofing algorithms for vulnerable automatic speaker verification systems due to the advancement of high-quality playback devices. Current studies mainly treat anti-spoofing as a binary classification problem between bonafide and spoofed utterances, while lack of indistinguishable samples makes it difficult to train a robust spoofing detector. In this paper, we argue that for anti-spoofing, it needs more attention for indistinguishable samples over easily-classified ones in the modeling process, to make correct discrimination a top priority. Therefore, to mitigate the data discrepancy between training and inference, we propose D3M, to leverage a balanced focal loss function as the training objective to dynamically scale the loss based on the traits of the sample itself. Besides, in the experiments, we select three kinds of features that contain both magnitude-based and phase-based information to form complementary and informative features. Experimental results on the ASVspoof2019 dataset demonstrate the superiority of the proposed methods by comparison between our systems and top-performing ones. Systems trained with the balanced focal loss perform significantly better than conventional cross-entropy loss. With complementary features, our fusion system with only three kinds of features outperforms other systems containing five or more complex single models by 22.5% for min-tDCF and 7% for EER, achieving a min-tDCF and an EER of 0.0124 and 0.55% respectively. Furthermore, we present and discuss the evaluation results on real replay data apart from the simulated ASVspoof2019 data, indicating that research for anti-spoofing still has a long way to go. Source code, analysis data, and other details are publicly available at https://github.com/asvspoof/D3M.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2006.14563</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2023-01 |
issn | 2331-8422 |
language | eng |
recordid | cdi_arxiv_primary_2006_14563 |
source | arXiv.org; Free E- Journals |
subjects | Algorithms Audio equipment Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning Computer simulation Entropy (Information theory) Spoofing Statistics - Machine Learning System effectiveness Training |
title | Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T17%3A30%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Dynamically%20Mitigating%20Data%20Discrepancy%20with%20Balanced%20Focal%20Loss%20for%20Replay%20Attack%20Detection&rft.jtitle=arXiv.org&rft.au=Dou,%20Yongqiang&rft.date=2023-01-18&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2006.14563&rft_dat=%3Cproquest_arxiv%3E2417703486%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2417703486&rft_id=info:pmid/&rfr_iscdi=true |