Concept-based explainability for an EEG transformer model
Deep learning models are complex due to their size, structure, and inherent randomness in training procedures. Additional complexity arises from the selection of datasets and inductive biases. Addressing these challenges for explainability, Kim et al. (2018) introduced Concept Activation Vectors (CA...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Deep learning models are complex due to their size, structure, and inherent
randomness in training procedures. Additional complexity arises from the
selection of datasets and inductive biases. Addressing these challenges for
explainability, Kim et al. (2018) introduced Concept Activation Vectors (CAVs),
which aim to understand deep models' internal states in terms of human-aligned
concepts. These concepts correspond to directions in latent space, identified
using linear discriminants. Although this method was first applied to image
classification, it was later adapted to other domains, including natural
language processing. In this work, we attempt to apply the method to
electroencephalogram (EEG) data for explainability in Kostas et al.'s BENDR
(2021), a large-scale transformer model. A crucial part of this endeavor
involves defining the explanatory concepts and selecting relevant datasets to
ground concepts in the latent space. Our focus is on two mechanisms for EEG
concept formation: the use of externally labeled EEG datasets, and the
application of anatomically defined concepts. The former approach is a
straightforward generalization of methods used in image classification, while
the latter is novel and specific to EEG. We present evidence that both
approaches to concept formation yield valuable insights into the
representations learned by deep EEG models. |
---|---|
DOI: | 10.48550/arxiv.2307.12745 |