On the efficient execution of bounded Jaro-Winkler distances

Over the last years, time-efficient approaches for the discovery of links between knowledge bases have been regarded as a key requirement towards implementing the idea of a Data Web. Thus, efficient and effective measures for comparing the labels of resources are central to facilitate the discovery...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Semantic Web 2017-01, Vol.8 (2), p.185-196
Hauptverfasser: Dreßler, Kevin, Ngonga Ngomo, Axel-Cyrille
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Over the last years, time-efficient approaches for the discovery of links between knowledge bases have been regarded as a key requirement towards implementing the idea of a Data Web. Thus, efficient and effective measures for comparing the labels of resources are central to facilitate the discovery of links between datasets on the Web of Data as well as their integration and fusion. We present a novel time-efficient implementation of filters that allow for the efficient execution of bounded Jaro-Winkler measures. We evaluate our approach on several datasets derived from DBpedia 3.9 and LinkedGeoData and containing up to 10 6 strings and show that it scales linearly with the size of the data for large thresholds. Moreover, we also show that our approach can be easily implemented in parallel. We also evaluate our approach against SILK and show that we outperform it even on small datasets.
ISSN:1570-0844
2210-4968
DOI:10.3233/SW-150209