Quantifying the preferential direction of the model gradient in adversarial training with projected gradient descent

•Input gradients of adversarially-trained robust models show a preferred direction.•These gradients point more directly to the closest point of inaccurate classes.•Generative adversarial networks estimate this direction and its alignment metric.•Metric correlates to robustness for models trained wit...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Pattern recognition 2023-07, Vol.139, p.109430, Article 109430
Hauptverfasser:	Bigolin Lanfredi, Ricardo, Schroeder, Joyce D., Tasdizen, Tolga
Format:	Artikel
Sprache:	eng
Schlagworte:	Adversarial training Deep learning GAN Gradient alignment Gradient direction PGD Robust models Robustness
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•Input gradients of adversarially-trained robust models show a preferred direction.•These gradients point more directly to the closest point of inaccurate classes.•Generative adversarial networks estimate this direction and its alignment metric.•Metric correlates to robustness for models trained with projected gradient descent.•Enforcing the proposed alignment increases robustness of models. Adversarial training, especially projected gradient descent (PGD), has proven to be a successful approach for improving robustness against adversarial attacks. After adversarial training, gradients of models with respect to their inputs have a preferential direction. However, the direction of alignment is not mathematically well established, making it difficult to evaluate quantitatively. We propose a novel definition of this direction as the direction of the vector pointing toward the closest point of the support of the closest inaccurate class in decision space. To evaluate the alignment with this direction after adversarial training, we apply a metric that uses generative adversarial networks to produce the smallest residual needed to change the class present in the image. We show that PGD-trained models have a higher alignment than the baseline according to our definition, that our metric presents higher alignment values than a competing metric formulation, and that enforcing this alignment increases the robustness of models.
ISSN:	0031-3203 1873-5142
DOI:	10.1016/j.patcog.2023.109430