Searching for the Essence of Adversarial Perturbations
Neural networks have demonstrated state-of-the-art performance in various machine learning fields. However, the introduction of malicious perturbations in input data, known as adversarial examples, has been shown to deceive neural network predictions. This poses potential risks for real-world applic...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Neural networks have demonstrated state-of-the-art performance in various
machine learning fields. However, the introduction of malicious perturbations
in input data, known as adversarial examples, has been shown to deceive neural
network predictions. This poses potential risks for real-world applications
such as autonomous driving and text identification. In order to mitigate these
risks, a comprehensive understanding of the mechanisms underlying adversarial
examples is essential. In this study, we demonstrate that adversarial
perturbations contain human-recognizable information, which is the key
conspirator responsible for a neural network's incorrect prediction, in
contrast to the widely held belief that human-unidentifiable characteristics
play a critical role in fooling a network. This concept of human-recognizable
characteristics enables us to explain key features of adversarial
perturbations, including their existence, transferability among different
neural networks, and increased interpretability for adversarial training. We
also uncover two unique properties of adversarial perturbations that deceive
neural networks: masking and generation. Additionally, a special class, the
complementary class, is identified when neural networks classify input images.
The presence of human-recognizable information in adversarial perturbations
allows researchers to gain insight into the working principles of neural
networks and may lead to the development of techniques for detecting and
defending against adversarial attacks. |
---|---|
DOI: | 10.48550/arxiv.2205.15357 |