Model Reconstruction Using Counterfactual Explanations: A Perspective From Polytope Theory
Counterfactual explanations provide ways of achieving a favorable model outcome with minimum input perturbation. However, counterfactual explanations can also be leveraged to reconstruct the model by strategically training a surrogate model to give similar predictions as the original (target) model....
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Counterfactual explanations provide ways of achieving a favorable model
outcome with minimum input perturbation. However, counterfactual explanations
can also be leveraged to reconstruct the model by strategically training a
surrogate model to give similar predictions as the original (target) model. In
this work, we analyze how model reconstruction using counterfactuals can be
improved by further leveraging the fact that the counterfactuals also lie quite
close to the decision boundary. Our main contribution is to derive novel
theoretical relationships between the error in model reconstruction and the
number of counterfactual queries required using polytope theory. Our
theoretical analysis leads us to propose a strategy for model reconstruction
that we call Counterfactual Clamping Attack (CCA) which trains a surrogate
model using a unique loss function that treats counterfactuals differently than
ordinary instances. Our approach also alleviates the related problem of
decision boundary shift that arises in existing model reconstruction approaches
when counterfactuals are treated as ordinary instances. Experimental results
demonstrate that our strategy improves fidelity between the target and
surrogate model predictions on several datasets. |
---|---|
DOI: | 10.48550/arxiv.2405.05369 |