Adversarial Robustification via Text-to-Image Diffusion Models
Adversarial robustness has been conventionally believed as a challenging property to encode for neural networks, requiring plenty of training data. In the recent paradigm of adopting off-the-shelf models, however, access to their training data is often infeasible or not practical, while most of such...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Adversarial robustness has been conventionally believed as a challenging
property to encode for neural networks, requiring plenty of training data. In
the recent paradigm of adopting off-the-shelf models, however, access to their
training data is often infeasible or not practical, while most of such models
are not originally trained concerning adversarial robustness. In this paper, we
develop a scalable and model-agnostic solution to achieve adversarial
robustness without using any data. Our intuition is to view recent
text-to-image diffusion models as "adaptable" denoisers that can be optimized
to specify target tasks. Based on this, we propose: (a) to initiate a
denoise-and-classify pipeline that offers provable guarantees against
adversarial attacks, and (b) to leverage a few synthetic reference images
generated from the text-to-image model that enables novel adaptation schemes.
Our experiments show that our data-free scheme applied to the pre-trained CLIP
could improve the (provable) adversarial robustness of its diverse zero-shot
classification derivatives (while maintaining their accuracy), significantly
surpassing prior approaches that utilize the full training data. Not only for
CLIP, we also demonstrate that our framework is easily applicable for
robustifying other visual classifiers efficiently. |
---|---|
DOI: | 10.48550/arxiv.2407.18658 |