Semantic Structure and Interpretability of Word Embeddings

Dense word embeddings, which encode meanings of words to low-dimensional vector spaces, have become very popular in natural language processing (NLP) research due to their state-of-the-art performances in many NLP tasks. Word embeddings are substantially successful in capturing semantic relations am...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2018-10, Vol.26 (10), p.1769-1779
Hauptverfasser:	Senel, Lutfi Kerem, Utlu, Ihsan, Yucesoy, Veysel, Koc, Aykut, Cukur, Tolga
Format:	Artikel
Sprache:	eng
Schlagworte:	Interpretability Intrusion Natural language processing semantic structure Semantics Sparse matrices Speech processing Statistical analysis Task analysis Vector spaces word embeddings
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Dense word embeddings, which encode meanings of words to low-dimensional vector spaces, have become very popular in natural language processing (NLP) research due to their state-of-the-art performances in many NLP tasks. Word embeddings are substantially successful in capturing semantic relations among words, so a meaningful semantic structure must be present in the respective vector spaces. However, in many cases, this semantic structure is broadly and heterogeneously distributed across the embedding dimensions making interpretation of dimensions a big challenge. In this study, we propose a statistical method to uncover the underlying latent semantic structure in the dense word embeddings. To perform our analysis, we introduce a new dataset (SEMCAT) that contains more than 6500 words semantically grouped under 110 categories. We further propose a method to quantify the interpretability of the word embeddings. The proposed method is a practical alternative to the classical word intrusion test that requires human intervention.
ISSN:	2329-9290 2329-9304
DOI:	10.1109/TASLP.2018.2837384