Analyzing and Improving the Robustness of Tabular Classifiers using Counterfactual Explanations

Recent studies have revealed that Machine Learning (ML) models are vulnerable to adversarial perturbations. Such perturbations can be intentionally or accidentally added to the original inputs, evading the classifier’s behavior to misclassify the crafted samples. A widely-used solution is to retrain...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Rasouli, Peyman, Yu, Ingrid Chieh
Format:	Buch
Sprache:	nor
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Recent studies have revealed that Machine Learning (ML) models are vulnerable to adversarial perturbations. Such perturbations can be intentionally or accidentally added to the original inputs, evading the classifier’s behavior to misclassify the crafted samples. A widely-used solution is to retrain the model using data points generated by various attack strategies. However, this creates a classifier robust to some particular evasions and can not defend unknown or universal perturbations. Counterfactual explanations are a specific class of post-hoc explanation methods that provide minimal modification to the input features in order to obtain a particular outcome from the model. In addition to the resemblance of counterfactual explanations to the universal perturbations, the possibility of generating instances from specific classes makes such approaches suitable for analyzing and improving the model’s robustness. Rather than explaining the model’s decisions in the deployment phase, we utilize the distance information obtained from counterfactuals and propose novel metrics to analyze the robustness of tabular classifiers. Further, we introduce a decision boundary modification approach using customized counterfactual data points to improve the robustness of the models without compromising their accuracy. Our framework addresses the robustness of black-box classifiers in the tabular setting, which is considered an under-explored research area. Through several experiments and evaluations, we demonstrate the efficacy of our approach in analyzing and improving the robustness of black-box tabular classifiers.