ADVERSARIAL CORRUPTION FOR ATTRIBUTION-BASED EXPLANATIONS VALIDATION

Herein are machine learning (ML) explainability (MLX) techniques that perturb a non-anomalous tuple to generate an anomalous tuple as adversarial input to any explainer that is based on feature attribution. In an embodiment, a computer generates, from a non-anomalous tuple, an anomalous tuple that c...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Schneuwly, Arno, Schmidt, Felix, Kobayashi, Kenyu, Khasanova, Renata, Casserini, Matteo
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Herein are machine learning (ML) explainability (MLX) techniques that perturb a non-anomalous tuple to generate an anomalous tuple as adversarial input to any explainer that is based on feature attribution. In an embodiment, a computer generates, from a non-anomalous tuple, an anomalous tuple that contains a perturbed value of a perturbed feature. In the anomalous tuple, the perturbed value of the perturbed feature is modified to cause a change in reconstruction error for the anomalous tuple. The change in reconstruction error includes a decrease in reconstruction error of the perturbed feature and/or an increase in a sum of reconstruction error of all features that are not the perturbed feature. After modifying the perturbed value, an attribution-based explainer automatically generates an explanation that identifies an identified feature as a cause of the anomalous tuple being anomalous. Whether the identified feature of the explanation is or is not the perturbed feature is detected.