Maximum-Entropy Adversarial Audio Augmentation for Keyword Spotting

Data augmentation is a key tool for improving the performance of deep networks, particularly when there is limited labeled data. In some fields, such as computer vision, augmentation methods have been extensively studied; however, for speech and audio data, there are relatively fewer methods develop...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Ye, Zuzhao, Ciccarelli, Gregory, Kulis, Brian
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Ye, Zuzhao Ciccarelli, Gregory Kulis, Brian
description	Data augmentation is a key tool for improving the performance of deep networks, particularly when there is limited labeled data. In some fields, such as computer vision, augmentation methods have been extensively studied; however, for speech and audio data, there are relatively fewer methods developed. Using adversarial learning as a starting point, we develop a simple and effective augmentation strategy based on taking the gradient of the entropy of the outputs with respect to the inputs and then creating new data points by moving in the direction of the gradient to maximize the entropy. We validate its efficacy on several keyword spotting tasks as well as standard audio benchmarks. Our method is straightforward to implement, offering greater computational efficiency than more complex adversarial schemes like GANs. Despite its simplicity, it proves robust and effective, especially when combined with the established SpecAugment technique, leading to enhanced performance.
doi_str_mv	10.48550/arxiv.2401.06897
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2401_06897</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2401_06897</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-87dff5cdaf6f194ef0f434892fda4ea2999c85b32bd3f97916fa8a2d3ed1138b3</originalsourceid><addsrcrecordid>eNotz71OwzAYhWEvHVDhApjwDST1X2J7jKLyI1ox0D36UvurLDVx5LqluXugsJx3O9JDyCNnpTJVxVaQruFSCsV4yWpj9R1pt3ANw3ko1mNOcZpp4y4-nSAFONLm7EL82cPgxww5xJFiTPTdz18xOfo5xZzDeLgnC4TjyT_8d0l2z-td-1psPl7e2mZTQK11YbRDrPYOsEZulUeGSipjBTpQHoS1dm-qXoreSbTa8hrBgHDSO86l6eWSPP3d3hTdlMIAae5-Nd1NI78BqG5GRQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Maximum-Entropy Adversarial Audio Augmentation for Keyword Spotting</title><source>arXiv.org</source><creator>Ye, Zuzhao ; Ciccarelli, Gregory ; Kulis, Brian</creator><creatorcontrib>Ye, Zuzhao ; Ciccarelli, Gregory ; Kulis, Brian</creatorcontrib><description>Data augmentation is a key tool for improving the performance of deep networks, particularly when there is limited labeled data. In some fields, such as computer vision, augmentation methods have been extensively studied; however, for speech and audio data, there are relatively fewer methods developed. Using adversarial learning as a starting point, we develop a simple and effective augmentation strategy based on taking the gradient of the entropy of the outputs with respect to the inputs and then creating new data points by moving in the direction of the gradient to maximize the entropy. We validate its efficacy on several keyword spotting tasks as well as standard audio benchmarks. Our method is straightforward to implement, offering greater computational efficiency than more complex adversarial schemes like GANs. Despite its simplicity, it proves robust and effective, especially when combined with the established SpecAugment technique, leading to enhanced performance.</description><identifier>DOI: 10.48550/arxiv.2401.06897</identifier><language>eng</language><creationdate>2024-01</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2401.06897$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2401.06897$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Ye, Zuzhao</creatorcontrib><creatorcontrib>Ciccarelli, Gregory</creatorcontrib><creatorcontrib>Kulis, Brian</creatorcontrib><title>Maximum-Entropy Adversarial Audio Augmentation for Keyword Spotting</title><description>Data augmentation is a key tool for improving the performance of deep networks, particularly when there is limited labeled data. In some fields, such as computer vision, augmentation methods have been extensively studied; however, for speech and audio data, there are relatively fewer methods developed. Using adversarial learning as a starting point, we develop a simple and effective augmentation strategy based on taking the gradient of the entropy of the outputs with respect to the inputs and then creating new data points by moving in the direction of the gradient to maximize the entropy. We validate its efficacy on several keyword spotting tasks as well as standard audio benchmarks. Our method is straightforward to implement, offering greater computational efficiency than more complex adversarial schemes like GANs. Despite its simplicity, it proves robust and effective, especially when combined with the established SpecAugment technique, leading to enhanced performance.</description><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz71OwzAYhWEvHVDhApjwDST1X2J7jKLyI1ox0D36UvurLDVx5LqluXugsJx3O9JDyCNnpTJVxVaQruFSCsV4yWpj9R1pt3ANw3ko1mNOcZpp4y4-nSAFONLm7EL82cPgxww5xJFiTPTdz18xOfo5xZzDeLgnC4TjyT_8d0l2z-td-1psPl7e2mZTQK11YbRDrPYOsEZulUeGSipjBTpQHoS1dm-qXoreSbTa8hrBgHDSO86l6eWSPP3d3hTdlMIAae5-Nd1NI78BqG5GRQ</recordid><startdate>20240112</startdate><enddate>20240112</enddate><creator>Ye, Zuzhao</creator><creator>Ciccarelli, Gregory</creator><creator>Kulis, Brian</creator><scope>GOX</scope></search><sort><creationdate>20240112</creationdate><title>Maximum-Entropy Adversarial Audio Augmentation for Keyword Spotting</title><author>Ye, Zuzhao ; Ciccarelli, Gregory ; Kulis, Brian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-87dff5cdaf6f194ef0f434892fda4ea2999c85b32bd3f97916fa8a2d3ed1138b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Ye, Zuzhao</creatorcontrib><creatorcontrib>Ciccarelli, Gregory</creatorcontrib><creatorcontrib>Kulis, Brian</creatorcontrib><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ye, Zuzhao</au><au>Ciccarelli, Gregory</au><au>Kulis, Brian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Maximum-Entropy Adversarial Audio Augmentation for Keyword Spotting</atitle><date>2024-01-12</date><risdate>2024</risdate><abstract>Data augmentation is a key tool for improving the performance of deep networks, particularly when there is limited labeled data. In some fields, such as computer vision, augmentation methods have been extensively studied; however, for speech and audio data, there are relatively fewer methods developed. Using adversarial learning as a starting point, we develop a simple and effective augmentation strategy based on taking the gradient of the entropy of the outputs with respect to the inputs and then creating new data points by moving in the direction of the gradient to maximize the entropy. We validate its efficacy on several keyword spotting tasks as well as standard audio benchmarks. Our method is straightforward to implement, offering greater computational efficiency than more complex adversarial schemes like GANs. Despite its simplicity, it proves robust and effective, especially when combined with the established SpecAugment technique, leading to enhanced performance.</abstract><doi>10.48550/arxiv.2401.06897</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2401.06897
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2401_06897
source	arXiv.org
title	Maximum-Entropy Adversarial Audio Augmentation for Keyword Spotting
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T13%3A31%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Maximum-Entropy%20Adversarial%20Audio%20Augmentation%20for%20Keyword%20Spotting&rft.au=Ye,%20Zuzhao&rft.date=2024-01-12&rft_id=info:doi/10.48550/arxiv.2401.06897&rft_dat=%3Carxiv_GOX%3E2401_06897%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true