Exploring the interpretability of the BERT model for semantic similarity
This study addresses the issue of semantic similarity in sentences using the BERT model through various aggregation techniques, such as max-pooling, mean-pooling, and an LSTM network applied to the output of the BERT model. Subsequently, the linguistic interpretability of the BERT-Base transformer m...
Gespeichert in:
Veröffentlicht in: | Journal of intelligent & fuzzy systems 2024-03, p.1-14 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This study addresses the issue of semantic similarity in sentences using the BERT model through various aggregation techniques, such as max-pooling, mean-pooling, and an LSTM network applied to the output of the BERT model. Subsequently, the linguistic interpretability of the BERT-Base transformer model is analyzed through the unsupervised learning approach, specifically through dimensionality reduction using autoencoders and clustering algorithms, utilizing the representation of the classification token CLS. The results highlight that the CLS classification token achieves better abstractions than the proposed methods. In terms of interpretability, it is observed that sequence length is relevant in the early layers, with a gradual decrease across the layers. Additionally, attention to semantic similarity is concentrated in the intermediate and upper layers, especially in layers 6, 8, 9, and 10. All these findings were obtained by addressing the semantic similarity task using the STS-Benchmark dataset. |
---|---|
ISSN: | 1064-1246 1875-8967 |
DOI: | 10.3233/JIFS-219359 |