Language Model Guided Knowledge Graph Embeddings

Knowledge graph embedding models have become a popular approach for knowledge graph completion through predicting the plausibility of (potential) triples. This is performed by transforming the entities and relations of the knowledge graph into an embedding space. However, knowledge graphs often incl...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2022, Vol.10, p.76008-76020
Hauptverfasser:	Alam, Mirza Mohtashim, Rony, Md Rashad Al Hasan, Nayyeri, Mojtaba, Mohiuddin, Karishma, Akter, M. S. T. Mahfuja, Vahdati, Sahar, Lehmann, Jens
Format:	Artikel
Sprache:	eng
Schlagworte:	Ablation Adaptation models Computational modeling Datasets Embedding Graphs Knowledge engineering Knowledge graph knowledge graph embeddings Knowledge representation Language language models Learning link prediction Predictive models Slack variables Task analysis Unified modeling language
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Knowledge graph embedding models have become a popular approach for knowledge graph completion through predicting the plausibility of (potential) triples. This is performed by transforming the entities and relations of the knowledge graph into an embedding space. However, knowledge graphs often include further textual information stored in literal, which is ignored by such embedding models. As a consequence, the learning process stays limited to the structure and the connections between the entities, which has the potential to negatively influence the performance. We bridge this gap by leveraging the capabilities of pre-trained language models to include textual knowledge in the learning process of embedding models. This is achieved by introducing a new loss function that guides embedding models in measuring the likelihood of triples by taking such complementary knowledge into consideration. The proposed solution is a model-independent loss function that can be plugged into any knowledge graph embedding model. In this paper, Sentence-BERT and fastText are used as pre-trained language models from which the embeddings of the textual knowledge are obtained and injected into the loss function. The loss function contains a trainable slack variable that determines the degree to which the language models influence the plausibility of triples. Our experimental evaluation on six benchmarks, namely Nations, UMLS, WordNet, and three versions of CodEx confirms the advantage of using pre-trained language models for boosting the accuracy of knowledge graph embedding models. We showcase this by performing evaluations on top of the five well-known knowledge graph embedding models such as TransE, RotatE, ComplEx, DistMult, and QuatE. The results show an improvement in accuracy up to 9% on UMLS dataset for the Distmult model and 4.2% on the Nations dataset for the ComplEx model when they are guided by pre-trained language models. We additionally studied the effect of multiple factors such as the structure of the knowledge graphs and training steps and presented them as ablation studies.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2022.3191666