Learning to Disentangle Robust and Vulnerable Features for Adversarial Detection

Although deep neural networks have shown promising performances on various tasks, even achieving human-level performance on some, they are shown to be susceptible to incorrect predictions even with imperceptibly small perturbations to an input. There exists a large number of previous works which pro...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Joe, Byunggill, Hwang, Sung Ju, Shin, Insik
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Joe, Byunggill
Hwang, Sung Ju
Shin, Insik
description Although deep neural networks have shown promising performances on various tasks, even achieving human-level performance on some, they are shown to be susceptible to incorrect predictions even with imperceptibly small perturbations to an input. There exists a large number of previous works which proposed to defend against such adversarial attacks either by robust inference or detection of adversarial inputs. Yet, most of them cannot effectively defend against whitebox attacks where an adversary has a knowledge of the model and defense. More importantly, they do not provide a convincing reason why the generated adversarial inputs successfully fool the target models. To address these shortcomings of the existing approaches, we hypothesize that the adversarial inputs are tied to latent features that are susceptible to adversarial perturbation, which we call vulnerable features. Then based on this intuition, we propose a minimax game formulation to disentangle the latent features of each instance into robust and vulnerable ones, using variational autoencoders with two latent spaces. We thoroughly validate our model for both blackbox and whitebox attacks on MNIST, Fashion MNIST5, and Cat & Dog datasets, whose results show that the adversarial inputs cannot bypass our detector without changing its semantics, in which case the attack has failed.
doi_str_mv 10.48550/arxiv.1909.04311
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1909_04311</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1909_04311</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-29023a4c254ad32f9cec78a87f27b36899d7fa99a23a8d1c1ada0a17c565e7793</originalsourceid><addsrcrecordid>eNotz99KwzAYBfDceCHTB_DKvEBr_jRNcjk2p0JhIsPb8jX5MgI1lTQd-vbO6dWBw-HAj5A7zurGKMUeIH_FU80tszVrJOfX5LVDyCmmIy0T3cYZU4F0HJG-TcMyFwrJ0_dlTJhhOLc7hLJknGmYMl37E-YZcoSRbrGgK3FKN-QqwDjj7X-uyGH3eNg8V93-6WWz7ipoNa-EZUJC44RqwEsRrEOnDRgdhB5ka6z1OoC1cF4Zzx0HDwy4dqpVqLWVK3L_d3sh9Z85fkD-7n9p_YUmfwBlqknU</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Learning to Disentangle Robust and Vulnerable Features for Adversarial Detection</title><source>arXiv.org</source><creator>Joe, Byunggill ; Hwang, Sung Ju ; Shin, Insik</creator><creatorcontrib>Joe, Byunggill ; Hwang, Sung Ju ; Shin, Insik</creatorcontrib><description>Although deep neural networks have shown promising performances on various tasks, even achieving human-level performance on some, they are shown to be susceptible to incorrect predictions even with imperceptibly small perturbations to an input. There exists a large number of previous works which proposed to defend against such adversarial attacks either by robust inference or detection of adversarial inputs. Yet, most of them cannot effectively defend against whitebox attacks where an adversary has a knowledge of the model and defense. More importantly, they do not provide a convincing reason why the generated adversarial inputs successfully fool the target models. To address these shortcomings of the existing approaches, we hypothesize that the adversarial inputs are tied to latent features that are susceptible to adversarial perturbation, which we call vulnerable features. Then based on this intuition, we propose a minimax game formulation to disentangle the latent features of each instance into robust and vulnerable ones, using variational autoencoders with two latent spaces. We thoroughly validate our model for both blackbox and whitebox attacks on MNIST, Fashion MNIST5, and Cat &amp; Dog datasets, whose results show that the adversarial inputs cannot bypass our detector without changing its semantics, in which case the attack has failed.</description><identifier>DOI: 10.48550/arxiv.1909.04311</identifier><language>eng</language><subject>Computer Science - Learning ; Statistics - Machine Learning</subject><creationdate>2019-09</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1909.04311$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1909.04311$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Joe, Byunggill</creatorcontrib><creatorcontrib>Hwang, Sung Ju</creatorcontrib><creatorcontrib>Shin, Insik</creatorcontrib><title>Learning to Disentangle Robust and Vulnerable Features for Adversarial Detection</title><description>Although deep neural networks have shown promising performances on various tasks, even achieving human-level performance on some, they are shown to be susceptible to incorrect predictions even with imperceptibly small perturbations to an input. There exists a large number of previous works which proposed to defend against such adversarial attacks either by robust inference or detection of adversarial inputs. Yet, most of them cannot effectively defend against whitebox attacks where an adversary has a knowledge of the model and defense. More importantly, they do not provide a convincing reason why the generated adversarial inputs successfully fool the target models. To address these shortcomings of the existing approaches, we hypothesize that the adversarial inputs are tied to latent features that are susceptible to adversarial perturbation, which we call vulnerable features. Then based on this intuition, we propose a minimax game formulation to disentangle the latent features of each instance into robust and vulnerable ones, using variational autoencoders with two latent spaces. We thoroughly validate our model for both blackbox and whitebox attacks on MNIST, Fashion MNIST5, and Cat &amp; Dog datasets, whose results show that the adversarial inputs cannot bypass our detector without changing its semantics, in which case the attack has failed.</description><subject>Computer Science - Learning</subject><subject>Statistics - Machine Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz99KwzAYBfDceCHTB_DKvEBr_jRNcjk2p0JhIsPb8jX5MgI1lTQd-vbO6dWBw-HAj5A7zurGKMUeIH_FU80tszVrJOfX5LVDyCmmIy0T3cYZU4F0HJG-TcMyFwrJ0_dlTJhhOLc7hLJknGmYMl37E-YZcoSRbrGgK3FKN-QqwDjj7X-uyGH3eNg8V93-6WWz7ipoNa-EZUJC44RqwEsRrEOnDRgdhB5ka6z1OoC1cF4Zzx0HDwy4dqpVqLWVK3L_d3sh9Z85fkD-7n9p_YUmfwBlqknU</recordid><startdate>20190910</startdate><enddate>20190910</enddate><creator>Joe, Byunggill</creator><creator>Hwang, Sung Ju</creator><creator>Shin, Insik</creator><scope>AKY</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20190910</creationdate><title>Learning to Disentangle Robust and Vulnerable Features for Adversarial Detection</title><author>Joe, Byunggill ; Hwang, Sung Ju ; Shin, Insik</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-29023a4c254ad32f9cec78a87f27b36899d7fa99a23a8d1c1ada0a17c565e7793</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Science - Learning</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Joe, Byunggill</creatorcontrib><creatorcontrib>Hwang, Sung Ju</creatorcontrib><creatorcontrib>Shin, Insik</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Joe, Byunggill</au><au>Hwang, Sung Ju</au><au>Shin, Insik</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning to Disentangle Robust and Vulnerable Features for Adversarial Detection</atitle><date>2019-09-10</date><risdate>2019</risdate><abstract>Although deep neural networks have shown promising performances on various tasks, even achieving human-level performance on some, they are shown to be susceptible to incorrect predictions even with imperceptibly small perturbations to an input. There exists a large number of previous works which proposed to defend against such adversarial attacks either by robust inference or detection of adversarial inputs. Yet, most of them cannot effectively defend against whitebox attacks where an adversary has a knowledge of the model and defense. More importantly, they do not provide a convincing reason why the generated adversarial inputs successfully fool the target models. To address these shortcomings of the existing approaches, we hypothesize that the adversarial inputs are tied to latent features that are susceptible to adversarial perturbation, which we call vulnerable features. Then based on this intuition, we propose a minimax game formulation to disentangle the latent features of each instance into robust and vulnerable ones, using variational autoencoders with two latent spaces. We thoroughly validate our model for both blackbox and whitebox attacks on MNIST, Fashion MNIST5, and Cat &amp; Dog datasets, whose results show that the adversarial inputs cannot bypass our detector without changing its semantics, in which case the attack has failed.</abstract><doi>10.48550/arxiv.1909.04311</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.1909.04311
ispartof
issn
language eng
recordid cdi_arxiv_primary_1909_04311
source arXiv.org
subjects Computer Science - Learning
Statistics - Machine Learning
title Learning to Disentangle Robust and Vulnerable Features for Adversarial Detection
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T13%3A22%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20to%20Disentangle%20Robust%20and%20Vulnerable%20Features%20for%20Adversarial%20Detection&rft.au=Joe,%20Byunggill&rft.date=2019-09-10&rft_id=info:doi/10.48550/arxiv.1909.04311&rft_dat=%3Carxiv_GOX%3E1909_04311%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true