Learning to Disentangle Robust and Vulnerable Features for Adversarial Detection

Although deep neural networks have shown promising performances on various tasks, even achieving human-level performance on some, they are shown to be susceptible to incorrect predictions even with imperceptibly small perturbations to an input. There exists a large number of previous works which pro...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Joe, Byunggill, Hwang, Sung Ju, Shin, Insik
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning Statistics - Machine Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Joe, Byunggill Hwang, Sung Ju Shin, Insik
description	Although deep neural networks have shown promising performances on various tasks, even achieving human-level performance on some, they are shown to be susceptible to incorrect predictions even with imperceptibly small perturbations to an input. There exists a large number of previous works which proposed to defend against such adversarial attacks either by robust inference or detection of adversarial inputs. Yet, most of them cannot effectively defend against whitebox attacks where an adversary has a knowledge of the model and defense. More importantly, they do not provide a convincing reason why the generated adversarial inputs successfully fool the target models. To address these shortcomings of the existing approaches, we hypothesize that the adversarial inputs are tied to latent features that are susceptible to adversarial perturbation, which we call vulnerable features. Then based on this intuition, we propose a minimax game formulation to disentangle the latent features of each instance into robust and vulnerable ones, using variational autoencoders with two latent spaces. We thoroughly validate our model for both blackbox and whitebox attacks on MNIST, Fashion MNIST5, and Cat & Dog datasets, whose results show that the adversarial inputs cannot bypass our detector without changing its semantics, in which case the attack has failed.
doi_str_mv	10.48550/arxiv.1909.04311
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1909_04311</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1909_04311</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-29023a4c254ad32f9cec78a87f27b36899d7fa99a23a8d1c1ada0a17c565e7793</originalsourceid><addsrcrecordid>eNotz99KwzAYBfDceCHTB_DKvEBr_jRNcjk2p0JhIsPb8jX5MgI1lTQd-vbO6dWBw-HAj5A7zurGKMUeIH_FU80tszVrJOfX5LVDyCmmIy0T3cYZU4F0HJG-TcMyFwrJ0_dlTJhhOLc7hLJknGmYMl37E-YZcoSRbrGgK3FKN-QqwDjj7X-uyGH3eNg8V93-6WWz7ipoNa-EZUJC44RqwEsRrEOnDRgdhB5ka6z1OoC1cF4Zzx0HDwy4dqpVqLWVK3L_d3sh9Z85fkD-7n9p_YUmfwBlqknU</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Learning to Disentangle Robust and Vulnerable Features for Adversarial Detection</title><source>arXiv.org</source><creator>Joe, Byunggill ; Hwang, Sung Ju ; Shin, Insik</creator><creatorcontrib>Joe, Byunggill ; Hwang, Sung Ju ; Shin, Insik</creatorcontrib><description>Although deep neural networks have shown promising performances on various tasks, even achieving human-level performance on some, they are shown to be susceptible to incorrect predictions even with imperceptibly small perturbations to an input. There exists a large number of previous works which proposed to defend against such adversarial attacks either by robust inference or detection of adversarial inputs. Yet, most of them cannot effectively defend against whitebox attacks where an adversary has a knowledge of the model and defense. More importantly, they do not provide a convincing reason why the generated adversarial inputs successfully fool the target models. To address these shortcomings of the existing approaches, we hypothesize that the adversarial inputs are tied to latent features that are susceptible to adversarial perturbation, which we call vulnerable features. Then based on this intuition, we propose a minimax game formulation to disentangle the latent features of each instance into robust and vulnerable ones, using variational autoencoders with two latent spaces. We thoroughly validate our model for both blackbox and whitebox attacks on MNIST, Fashion MNIST5, and Cat & Dog datasets, whose results show that the adversarial inputs cannot bypass our detector without changing its semantics, in which case the attack has failed.</description><identifier>DOI: 10.48550/arxiv.1909.04311</identifier><language>eng</language><subject>Computer Science - Learning ; Statistics - Machine Learning</subject><creationdate>2019-09</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1909.04311$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1909.04311$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Joe, Byunggill</creatorcontrib><creatorcontrib>Hwang, Sung Ju</creatorcontrib><creatorcontrib>Shin, Insik</creatorcontrib><title>Learning to Disentangle Robust and Vulnerable Features for Adversarial Detection</title><description>Although deep neural networks have shown promising performances on various tasks, even achieving human-level performance on some, they are shown to be susceptible to incorrect predictions even with imperceptibly small perturbations to an input. There exists a large number of previous works which proposed to defend against such adversarial attacks either by robust inference or detection of adversarial inputs. Yet, most of them cannot effectively defend against whitebox attacks where an adversary has a knowledge of the model and defense. More importantly, they do not provide a convincing reason why the generated adversarial inputs successfully fool the target models. To address these shortcomings of the existing approaches, we hypothesize that the adversarial inputs are tied to latent features that are susceptible to adversarial perturbation, which we call vulnerable features. Then based on this intuition, we propose a minimax game formulation to disentangle the latent features of each instance into robust and vulnerable ones, using variational autoencoders with two latent spaces. We thoroughly validate our model for both blackbox and whitebox attacks on MNIST, Fashion MNIST5, and Cat & Dog datasets, whose results show that the adversarial inputs cannot bypass our detector without changing its semantics, in which case the attack has failed.</description><subject>Computer Science - Learning</subject><subject>Statistics - Machine Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz99KwzAYBfDceCHTB_DKvEBr_jRNcjk2p0JhIsPb8jX5MgI1lTQd-vbO6dWBw-HAj5A7zurGKMUeIH_FU80tszVrJOfX5LVDyCmmIy0T3cYZU4F0HJG-TcMyFwrJ0_dlTJhhOLc7hLJknGmYMl37E-YZcoSRbrGgK3FKN-QqwDjj7X-uyGH3eNg8V93-6WWz7ipoNa-EZUJC44RqwEsRrEOnDRgdhB5ka6z1OoC1cF4Zzx0HDwy4dqpVqLWVK3L_d3sh9Z85fkD-7n9p_YUmfwBlqknU</recordid><startdate>20190910</startdate><enddate>20190910</enddate><creator>Joe, Byunggill</creator><creator>Hwang, Sung Ju</creator><creator>Shin, Insik</creator><scope>AKY</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20190910</creationdate><title>Learning to Disentangle Robust and Vulnerable Features for Adversarial Detection</title><author>Joe, Byunggill ; Hwang, Sung Ju ; Shin, Insik</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-29023a4c254ad32f9cec78a87f27b36899d7fa99a23a8d1c1ada0a17c565e7793</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Science - Learning</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Joe, Byunggill</creatorcontrib><creatorcontrib>Hwang, Sung Ju</creatorcontrib><creatorcontrib>Shin, Insik</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Joe, Byunggill</au><au>Hwang, Sung Ju</au><au>Shin, Insik</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning to Disentangle Robust and Vulnerable Features for Adversarial Detection</atitle><date>2019-09-10</date><risdate>2019</risdate><abstract>Although deep neural networks have shown promising performances on various tasks, even achieving human-level performance on some, they are shown to be susceptible to incorrect predictions even with imperceptibly small perturbations to an input. There exists a large number of previous works which proposed to defend against such adversarial attacks either by robust inference or detection of adversarial inputs. Yet, most of them cannot effectively defend against whitebox attacks where an adversary has a knowledge of the model and defense. More importantly, they do not provide a convincing reason why the generated adversarial inputs successfully fool the target models. To address these shortcomings of the existing approaches, we hypothesize that the adversarial inputs are tied to latent features that are susceptible to adversarial perturbation, which we call vulnerable features. Then based on this intuition, we propose a minimax game formulation to disentangle the latent features of each instance into robust and vulnerable ones, using variational autoencoders with two latent spaces. We thoroughly validate our model for both blackbox and whitebox attacks on MNIST, Fashion MNIST5, and Cat & Dog datasets, whose results show that the adversarial inputs cannot bypass our detector without changing its semantics, in which case the attack has failed.</abstract><doi>10.48550/arxiv.1909.04311</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1909.04311
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1909_04311
source	arXiv.org
subjects	Computer Science - Learning Statistics - Machine Learning
title	Learning to Disentangle Robust and Vulnerable Features for Adversarial Detection
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T13%3A22%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20to%20Disentangle%20Robust%20and%20Vulnerable%20Features%20for%20Adversarial%20Detection&rft.au=Joe,%20Byunggill&rft.date=2019-09-10&rft_id=info:doi/10.48550/arxiv.1909.04311&rft_dat=%3Carxiv_GOX%3E1909_04311%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true