Singular Regularization with Information Bottleneck Improves Model's Adversarial Robustness
Adversarial examples are one of the most severe threats to deep learning models. Numerous works have been proposed to study and defend adversarial examples. However, these works lack analysis of adversarial information or perturbation, which cannot reveal the mystery of adversarial examples and lose...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Adversarial examples are one of the most severe threats to deep learning
models. Numerous works have been proposed to study and defend adversarial
examples. However, these works lack analysis of adversarial information or
perturbation, which cannot reveal the mystery of adversarial examples and lose
proper interpretation. In this paper, we aim to fill this gap by studying
adversarial information as unstructured noise, which does not have a clear
pattern. Specifically, we provide some empirical studies with singular value
decomposition, by decomposing images into several matrices, to analyze
adversarial information for different attacks. Based on the analysis, we
propose a new module to regularize adversarial information and combine
information bottleneck theory, which is proposed to theoretically restrict
intermediate representations. Therefore, our method is interpretable. Moreover,
the fashion of our design is a novel principle that is general and unified.
Equipped with our new module, we evaluate two popular model structures on two
mainstream datasets with various adversarial attacks. The results indicate that
the improvement in robust accuracy is significant. On the other hand, we prove
that our method is efficient with only a few additional parameters and able to
be explained under regional faithfulness analysis. |
---|---|
DOI: | 10.48550/arxiv.2312.02237 |