Variational autoencoder densified graph attention for fusing synonymous entities: Model and protocol

The prediction of missing links of open knowledge graphs (OpenKGs) poses unique challenges compared with well-studied curated knowledge graphs (CuratedKGs). Unlike CuratedKGs whose entities are fully disambiguated against a fixed vocabulary, OpenKGs consist of entities represented by non-canonicaliz...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Knowledge-based systems 2023-01, Vol.259, p.110061, Article 110061
Hauptverfasser: Li, Qian, Wang, Daling, Feng, Shi, Song, Kaisong, Zhang, Yifei, Yu, Ge
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The prediction of missing links of open knowledge graphs (OpenKGs) poses unique challenges compared with well-studied curated knowledge graphs (CuratedKGs). Unlike CuratedKGs whose entities are fully disambiguated against a fixed vocabulary, OpenKGs consist of entities represented by non-canonicalized free-form noun phrases and do not require an ontology specification, which drives the synonymity (multiple entities with different surface forms have the same meaning) and sparsity (a large portion of entities with few links). How to capture synonymous features in such sparse situations and how to evaluate the multiple answers pose challenges to existing models and evaluation protocols. In this paper, we propose VGAT, a variational autoencoder densified graph attention model to automatically mine synonymity features, and propose CR, a cluster ranking protocol to evaluate multiple answers in OpenKGs. For the model, VGAT investigates the following key ideas: (1) phrasal synonymity encoder attempts to capture phrasal features, which can automatically make entities with synonymous texts have closer representations; (2) neighbor synonymity encoder mines structural features with a graph attention network, which can recursively make entities with synonymous neighbors closer in representations. (3) densification attempts to densify the OpenKGs by generating similar embeddings and negative samples. For the protocol, CR is designed from the significance and compactness perspectives to comprehensively evaluate multiple answers. Extensive experiments and analysis show the effectiveness of the VGAT model and rationality of the CR protocol. •Propose a new model to automatically mine synonymous features.•Design a novel evaluation protocol to evaluate multiple answers.•Densify OpenKGs by variational autoencoder and negative samples.•Improve the performance of link prediction.
ISSN:0950-7051
1872-7409
DOI:10.1016/j.knosys.2022.110061