Exploring the interpretability of the BERT model for semantic similarity

This study addresses the issue of semantic similarity in sentences using the BERT model through various aggregation techniques, such as max-pooling, mean-pooling, and an LSTM network applied to the output of the BERT model. Subsequently, the linguistic interpretability of the BERT-Base transformer m...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of intelligent & fuzzy systems 2024-03, p.1-14
Hauptverfasser:	Ledesma Roque, Diana Anahí, Kolesnikova, Olga, Menchaca Méndez, Ricardo
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This study addresses the issue of semantic similarity in sentences using the BERT model through various aggregation techniques, such as max-pooling, mean-pooling, and an LSTM network applied to the output of the BERT model. Subsequently, the linguistic interpretability of the BERT-Base transformer model is analyzed through the unsupervised learning approach, specifically through dimensionality reduction using autoencoders and clustering algorithms, utilizing the representation of the classification token CLS. The results highlight that the CLS classification token achieves better abstractions than the proposed methods. In terms of interpretability, it is observed that sequence length is relevant in the early layers, with a gradual decrease across the layers. Additionally, attention to semantic similarity is concentrated in the intermediate and upper layers, especially in layers 6, 8, 9, and 10. All these findings were obtained by addressing the semantic similarity task using the STS-Benchmark dataset.
ISSN:	1064-1246 1875-8967
DOI:	10.3233/JIFS-219359