Joint Universal Adversarial Perturbations with Interpretations

Deep neural networks (DNNs) have significantly boosted the performance of many challenging tasks. Despite the great development, DNNs have also exposed their vulnerability. Recent studies have shown that adversaries can manipulate the predictions of DNNs by adding a universal adversarial perturbatio...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-08
Hauptverfasser:	Liang-bo, Ning, Dai, Zeyu, Fan, Wenqi, Su, Jingran, Pan, Chao, Wang, Luning, Li, Qing
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Perturbation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Liang-bo, Ning Dai, Zeyu Fan, Wenqi Su, Jingran Pan, Chao Wang, Luning Li, Qing
description	Deep neural networks (DNNs) have significantly boosted the performance of many challenging tasks. Despite the great development, DNNs have also exposed their vulnerability. Recent studies have shown that adversaries can manipulate the predictions of DNNs by adding a universal adversarial perturbation (UAP) to benign samples. On the other hand, increasing efforts have been made to help users understand and explain the inner working of DNNs by highlighting the most informative parts (i.e., attribution maps) of samples with respect to their predictions. Moreover, we first empirically find that such attribution maps between benign and adversarial examples have a significant discrepancy, which has the potential to detect universal adversarial perturbations for defending against adversarial attacks. This finding motivates us to further investigate a new research problem: whether there exist universal adversarial perturbations that are able to jointly attack DNNs classifier and its interpretation with malicious desires. It is challenging to give an explicit answer since these two objectives are seemingly conflicting. In this paper, we propose a novel attacking framework to generate joint universal adversarial perturbations (JUAP), which can fool the DNNs model and misguide the inspection from interpreters simultaneously. Comprehensive experiments on various datasets demonstrate the effectiveness of the proposed method JUAP for joint attacks. To the best of our knowledge, this is the first effort to study UAP for jointly attacking both DNNs and interpretations.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3089689495</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3089689495</sourcerecordid><originalsourceid>FETCH-proquest_journals_30896894953</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSw88rPzCtRCM3LLEstKk7MUXBMATOKMoHsgNSiktKipMSSzPy8YoXyzJIMBc-8ktSigqLUEoggDwNrWmJOcSovlOZmUHZzDXH20C0oyi8sTS0uic_KLy3KA0rFGxtYWJpZWJpYmhoTpwoAMeg5gQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3089689495</pqid></control><display><type>article</type><title>Joint Universal Adversarial Perturbations with Interpretations</title><source>Free E- Journals</source><creator>Liang-bo, Ning ; Dai, Zeyu ; Fan, Wenqi ; Su, Jingran ; Pan, Chao ; Wang, Luning ; Li, Qing</creator><creatorcontrib>Liang-bo, Ning ; Dai, Zeyu ; Fan, Wenqi ; Su, Jingran ; Pan, Chao ; Wang, Luning ; Li, Qing</creatorcontrib><description>Deep neural networks (DNNs) have significantly boosted the performance of many challenging tasks. Despite the great development, DNNs have also exposed their vulnerability. Recent studies have shown that adversaries can manipulate the predictions of DNNs by adding a universal adversarial perturbation (UAP) to benign samples. On the other hand, increasing efforts have been made to help users understand and explain the inner working of DNNs by highlighting the most informative parts (i.e., attribution maps) of samples with respect to their predictions. Moreover, we first empirically find that such attribution maps between benign and adversarial examples have a significant discrepancy, which has the potential to detect universal adversarial perturbations for defending against adversarial attacks. This finding motivates us to further investigate a new research problem: whether there exist universal adversarial perturbations that are able to jointly attack DNNs classifier and its interpretation with malicious desires. It is challenging to give an explicit answer since these two objectives are seemingly conflicting. In this paper, we propose a novel attacking framework to generate joint universal adversarial perturbations (JUAP), which can fool the DNNs model and misguide the inspection from interpreters simultaneously. Comprehensive experiments on various datasets demonstrate the effectiveness of the proposed method JUAP for joint attacks. To the best of our knowledge, this is the first effort to study UAP for jointly attacking both DNNs and interpretations.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Artificial neural networks ; Perturbation</subject><ispartof>arXiv.org, 2024-08</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Liang-bo, Ning</creatorcontrib><creatorcontrib>Dai, Zeyu</creatorcontrib><creatorcontrib>Fan, Wenqi</creatorcontrib><creatorcontrib>Su, Jingran</creatorcontrib><creatorcontrib>Pan, Chao</creatorcontrib><creatorcontrib>Wang, Luning</creatorcontrib><creatorcontrib>Li, Qing</creatorcontrib><title>Joint Universal Adversarial Perturbations with Interpretations</title><title>arXiv.org</title><description>Deep neural networks (DNNs) have significantly boosted the performance of many challenging tasks. Despite the great development, DNNs have also exposed their vulnerability. Recent studies have shown that adversaries can manipulate the predictions of DNNs by adding a universal adversarial perturbation (UAP) to benign samples. On the other hand, increasing efforts have been made to help users understand and explain the inner working of DNNs by highlighting the most informative parts (i.e., attribution maps) of samples with respect to their predictions. Moreover, we first empirically find that such attribution maps between benign and adversarial examples have a significant discrepancy, which has the potential to detect universal adversarial perturbations for defending against adversarial attacks. This finding motivates us to further investigate a new research problem: whether there exist universal adversarial perturbations that are able to jointly attack DNNs classifier and its interpretation with malicious desires. It is challenging to give an explicit answer since these two objectives are seemingly conflicting. In this paper, we propose a novel attacking framework to generate joint universal adversarial perturbations (JUAP), which can fool the DNNs model and misguide the inspection from interpreters simultaneously. Comprehensive experiments on various datasets demonstrate the effectiveness of the proposed method JUAP for joint attacks. To the best of our knowledge, this is the first effort to study UAP for jointly attacking both DNNs and interpretations.</description><subject>Artificial neural networks</subject><subject>Perturbation</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSw88rPzCtRCM3LLEstKk7MUXBMATOKMoHsgNSiktKipMSSzPy8YoXyzJIMBc-8ktSigqLUEoggDwNrWmJOcSovlOZmUHZzDXH20C0oyi8sTS0uic_KLy3KA0rFGxtYWJpZWJpYmhoTpwoAMeg5gQ</recordid><startdate>20240803</startdate><enddate>20240803</enddate><creator>Liang-bo, Ning</creator><creator>Dai, Zeyu</creator><creator>Fan, Wenqi</creator><creator>Su, Jingran</creator><creator>Pan, Chao</creator><creator>Wang, Luning</creator><creator>Li, Qing</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240803</creationdate><title>Joint Universal Adversarial Perturbations with Interpretations</title><author>Liang-bo, Ning ; Dai, Zeyu ; Fan, Wenqi ; Su, Jingran ; Pan, Chao ; Wang, Luning ; Li, Qing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30896894953</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial neural networks</topic><topic>Perturbation</topic><toplevel>online_resources</toplevel><creatorcontrib>Liang-bo, Ning</creatorcontrib><creatorcontrib>Dai, Zeyu</creatorcontrib><creatorcontrib>Fan, Wenqi</creatorcontrib><creatorcontrib>Su, Jingran</creatorcontrib><creatorcontrib>Pan, Chao</creatorcontrib><creatorcontrib>Wang, Luning</creatorcontrib><creatorcontrib>Li, Qing</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liang-bo, Ning</au><au>Dai, Zeyu</au><au>Fan, Wenqi</au><au>Su, Jingran</au><au>Pan, Chao</au><au>Wang, Luning</au><au>Li, Qing</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Joint Universal Adversarial Perturbations with Interpretations</atitle><jtitle>arXiv.org</jtitle><date>2024-08-03</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Deep neural networks (DNNs) have significantly boosted the performance of many challenging tasks. Despite the great development, DNNs have also exposed their vulnerability. Recent studies have shown that adversaries can manipulate the predictions of DNNs by adding a universal adversarial perturbation (UAP) to benign samples. On the other hand, increasing efforts have been made to help users understand and explain the inner working of DNNs by highlighting the most informative parts (i.e., attribution maps) of samples with respect to their predictions. Moreover, we first empirically find that such attribution maps between benign and adversarial examples have a significant discrepancy, which has the potential to detect universal adversarial perturbations for defending against adversarial attacks. This finding motivates us to further investigate a new research problem: whether there exist universal adversarial perturbations that are able to jointly attack DNNs classifier and its interpretation with malicious desires. It is challenging to give an explicit answer since these two objectives are seemingly conflicting. In this paper, we propose a novel attacking framework to generate joint universal adversarial perturbations (JUAP), which can fool the DNNs model and misguide the inspection from interpreters simultaneously. Comprehensive experiments on various datasets demonstrate the effectiveness of the proposed method JUAP for joint attacks. To the best of our knowledge, this is the first effort to study UAP for jointly attacking both DNNs and interpretations.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-08
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_3089689495
source	Free E- Journals
subjects	Artificial neural networks Perturbation
title	Joint Universal Adversarial Perturbations with Interpretations
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T13%3A27%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Joint%20Universal%20Adversarial%20Perturbations%20with%20Interpretations&rft.jtitle=arXiv.org&rft.au=Liang-bo,%20Ning&rft.date=2024-08-03&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3089689495%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3089689495&rft_id=info:pmid/&rfr_iscdi=true