Elevating Defenses: Bridging Adversarial Training and Watermarking for Model Resilience
Machine learning models are being used in an increasing number of critical applications; thus, securing their integrity and ownership is critical. Recent studies observed that adversarial training and watermarking have a conflicting interaction. This work introduces a novel framework to integrate ad...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Machine learning models are being used in an increasing number of critical
applications; thus, securing their integrity and ownership is critical. Recent
studies observed that adversarial training and watermarking have a conflicting
interaction. This work introduces a novel framework to integrate adversarial
training with watermarking techniques to fortify against evasion attacks and
provide confident model verification in case of intellectual property theft. We
use adversarial training together with adversarial watermarks to train a robust
watermarked model. The key intuition is to use a higher perturbation budget to
generate adversarial watermarks compared to the budget used for adversarial
training, thus avoiding conflict. We use the MNIST and Fashion-MNIST datasets
to evaluate our proposed technique on various model stealing attacks. The
results obtained consistently outperform the existing baseline in terms of
robustness performance and further prove the resilience of this defense against
pruning and fine-tuning removal attacks. |
---|---|
DOI: | 10.48550/arxiv.2312.14260 |