Exploring the Role of BERT Token Representations to Explain Sentence Probing Results
Several studies have been carried out on revealing linguistic features captured by BERT. This is usually achieved by training a diagnostic classifier on the representations obtained from different layers of BERT. The subsequent classification accuracy is then interpreted as the ability of the model...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Several studies have been carried out on revealing linguistic features
captured by BERT. This is usually achieved by training a diagnostic classifier
on the representations obtained from different layers of BERT. The subsequent
classification accuracy is then interpreted as the ability of the model in
encoding the corresponding linguistic property. Despite providing insights,
these studies have left out the potential role of token representations. In
this paper, we provide a more in-depth analysis on the representation space of
BERT in search for distinct and meaningful subspaces that can explain the
reasons behind these probing results. Based on a set of probing tasks and with
the help of attribution methods we show that BERT tends to encode meaningful
knowledge in specific token representations (which are often ignored in
standard classification setups), allowing the model to detect syntactic and
semantic abnormalities, and to distinctively separate grammatical number and
tense subspaces. |
---|---|
DOI: | 10.48550/arxiv.2104.01477 |