SCAAT: Improving Neural Network Interpretability via Saliency Constrained Adaptive Adversarial Training
Deep Neural Networks (DNNs) are expected to provide explanation for users to understand their black-box predictions. Saliency map is a common form of explanation illustrating the heatmap of feature attributions, but it suffers from noise in distinguishing important features. In this paper, we propos...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Deep Neural Networks (DNNs) are expected to provide explanation for users to
understand their black-box predictions. Saliency map is a common form of
explanation illustrating the heatmap of feature attributions, but it suffers
from noise in distinguishing important features. In this paper, we propose a
model-agnostic learning method called Saliency Constrained Adaptive Adversarial
Training (SCAAT) to improve the quality of such DNN interpretability. By
constructing adversarial samples under the guidance of saliency map, SCAAT
effectively eliminates most noise and makes saliency maps sparser and more
faithful without any modification to the model architecture. We apply SCAAT to
multiple DNNs and evaluate the quality of the generated saliency maps on
various natural and pathological image datasets. Evaluations on different
domains and metrics show that SCAAT significantly improves the interpretability
of DNNs by providing more faithful saliency maps without sacrificing their
predictive power. |
---|---|
DOI: | 10.48550/arxiv.2311.05143 |