Reducing classifier overconfidence against adversaries through graph algorithms

In this work we show that deep learning classifiers tend to become overconfident in their answers under adversarial attacks, even when the classifier is optimized to survive such attacks. Our work draws upon stochastic geometry and graph algorithms to propose a general framework to replace the last...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Machine learning 2023-07, Vol.112 (7), p.2619-2651
Hauptverfasser: Teixeira, Leonardo, Jalaian, Brian, Ribeiro, Bruno
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this work we show that deep learning classifiers tend to become overconfident in their answers under adversarial attacks, even when the classifier is optimized to survive such attacks. Our work draws upon stochastic geometry and graph algorithms to propose a general framework to replace the last fully connected layer and softmax output. This framework (a) can be applied to any classifier and (b) significantly reduces the classifier’s overconfidence in its output without much of an impact on its accuracy when compared to original adversarially-trained classifiers. Its relative effectiveness increases as the attacker becomes more powerful. Our use of graph algorithms in adversarial learning is new and of independent interest. Finally, we show the advantages of this last-layer softmax replacement over image tasks under common adversarial attacks.
ISSN:0885-6125
1573-0565
DOI:10.1007/s10994-023-06307-y