Explaining the Behavior of Black-Box Prediction Algorithms with Causal Learning
Causal approaches to post-hoc explainability for black-box prediction models (e.g., deep neural networks trained on image pixel data) have become increasingly popular. However, existing approaches have two important shortcomings: (i) the "explanatory units" are micro-level inputs into the...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Causal approaches to post-hoc explainability for black-box prediction models
(e.g., deep neural networks trained on image pixel data) have become
increasingly popular. However, existing approaches have two important
shortcomings: (i) the "explanatory units" are micro-level inputs into the
relevant prediction model, e.g., image pixels, rather than interpretable
macro-level features that are more useful for understanding how to possibly
change the algorithm's behavior, and (ii) existing approaches assume there
exists no unmeasured confounding between features and target model predictions,
which fails to hold when the explanatory units are macro-level variables. Our
focus is on the important setting where the analyst has no access to the inner
workings of the target prediction algorithm, rather only the ability to query
the output of the model in response to a particular input. To provide causal
explanations in such a setting, we propose to learn causal graphical
representations that allow for arbitrary unmeasured confounding among features.
We demonstrate the resulting graph can differentiate between interpretable
features that causally influence model predictions versus those that are merely
associated with model predictions due to confounding. Our approach is motivated
by a counterfactual theory of causal explanation wherein good explanations
point to factors that are "difference-makers" in an interventionist sense. |
---|---|
DOI: | 10.48550/arxiv.2006.02482 |