Attack Agnostic Statistical Method for Adversarial Detection
Deep Learning based AI systems have shown great promise in various domains such as vision, audio, autonomous systems (vehicles, drones), etc. Recent research on neural networks has shown the susceptibility of deep networks to adversarial attacks - a technique of adding small perturbations to the inp...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Deep Learning based AI systems have shown great promise in various domains
such as vision, audio, autonomous systems (vehicles, drones), etc. Recent
research on neural networks has shown the susceptibility of deep networks to
adversarial attacks - a technique of adding small perturbations to the inputs
which can fool a deep network into misclassifying them. Developing defenses
against such adversarial attacks is an active research area, with some
approaches proposing robust models that are immune to such adversaries, while
other techniques attempt to detect such adversarial inputs. In this paper, we
present a novel statistical approach for adversarial detection in image
classification. Our approach is based on constructing a per-class feature
distribution and detecting adversaries based on comparison of features of a
test image with the feature distribution of its class. For this purpose, we
make use of various statistical distances such as ED (Energy Distance), MMD
(Maximum Mean Discrepancy) for adversarial detection, and analyze the
performance of each metric. We experimentally show that our approach achieves
good adversarial detection performance on MNIST and CIFAR-10 datasets
irrespective of the attack method, sample size and the degree of adversarial
perturbation. |
---|---|
DOI: | 10.48550/arxiv.1911.10008 |