A constrained optimization algorithm for learning GloVe embeddings with semantic lexicons

GloVe representations of words as vector embeddings in continuous spaces are learned from matrix factorization of the words’ co-occurrences matrix constructed from large corpora. Due to their high quality as textual features, GloVe embeddings have been extensively utilized for many text mining and n...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Knowledge-based systems 2020-05, Vol.195, p.105628, Article 105628
Hauptverfasser: Sakketou, Flora, Ampazis, Nicholas
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:GloVe representations of words as vector embeddings in continuous spaces are learned from matrix factorization of the words’ co-occurrences matrix constructed from large corpora. Due to their high quality as textual features, GloVe embeddings have been extensively utilized for many text mining and natural language processing tasks with considerable success. Further improvements of these word representations can be obtained by also taking into account the valuable information of the semantic properties of the words and the complex relationships between them as provided by semantic lexicons. In this paper we adopt optimization techniques from the domain of machine learning with constrained optimization in order to leverage the relational knowledge between words, and we propose an efficient algorithm that produces word embeddings enhanced by the semantic information. The proposed algorithm outperforms other related approaches that utilize semantic information either during training or as a post-processing step. Our claims are validated by experiments on popular text mining and natural language processing tasks, including word similarities, word analogies, and sentiment analysis, which demonstrate that our proposed model can significantly improve the quality of word vector representations.
ISSN:0950-7051
1872-7409
DOI:10.1016/j.knosys.2020.105628