ConSim: Measuring Concept-Based Explanations' Effectiveness with Automated Simulatability
Concept-based explanations work by mapping complex model computations to human-understandable concepts. Evaluating such explanations is very difficult, as it includes not only the quality of the induced space of possible concepts but also how effectively the chosen concepts are communicated to users...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Concept-based explanations work by mapping complex model computations to
human-understandable concepts. Evaluating such explanations is very difficult,
as it includes not only the quality of the induced space of possible concepts
but also how effectively the chosen concepts are communicated to users.
Existing evaluation metrics often focus solely on the former, neglecting the
latter. We introduce an evaluation framework for measuring concept explanations
via automated simulatability: a simulator's ability to predict the explained
model's outputs based on the provided explanations. This approach accounts for
both the concept space and its interpretation in an end-to-end evaluation.
Human studies for simulatability are notoriously difficult to enact,
particularly at the scale of a wide, comprehensive empirical evaluation (which
is the subject of this work). We propose using large language models (LLMs) as
simulators to approximate the evaluation and report various analyses to make
such approximations reliable. Our method allows for scalable and consistent
evaluation across various models and datasets. We report a comprehensive
empirical evaluation using this framework and show that LLMs provide consistent
rankings of explanation methods. Code available at
https://github.com/AnonymousConSim/ConSim. |
---|---|
DOI: | 10.48550/arxiv.2501.05855 |