On the efficient execution of bounded Jaro-Winkler distances

Over the last years, time-efficient approaches for the discovery of links between knowledge bases have been regarded as a key requirement towards implementing the idea of a Data Web. Thus, efficient and effective measures for comparing the labels of resources are central to facilitate the discovery...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Semantic Web 2017-01, Vol.8 (2), p.185-196
Hauptverfasser: Dreßler, Kevin, Ngonga Ngomo, Axel-Cyrille
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 196
container_issue 2
container_start_page 185
container_title Semantic Web
container_volume 8
creator Dreßler, Kevin
Ngonga Ngomo, Axel-Cyrille
description Over the last years, time-efficient approaches for the discovery of links between knowledge bases have been regarded as a key requirement towards implementing the idea of a Data Web. Thus, efficient and effective measures for comparing the labels of resources are central to facilitate the discovery of links between datasets on the Web of Data as well as their integration and fusion. We present a novel time-efficient implementation of filters that allow for the efficient execution of bounded Jaro-Winkler measures. We evaluate our approach on several datasets derived from DBpedia 3.9 and LinkedGeoData and containing up to 10 6 strings and show that it scales linearly with the size of the data for large thresholds. Moreover, we also show that our approach can be easily implemented in parallel. We also evaluate our approach against SILK and show that we outperform it even on small datasets.
doi_str_mv 10.3233/SW-150209
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_1994005613</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1994005613</sourcerecordid><originalsourceid>FETCH-LOGICAL-c257t-656d6f6b568aadefd1302f7b3726fec865bb08d35990d48abdd3123b0f5dba2d3</originalsourceid><addsrcrecordid>eNotkE1LAzEYhIMoWGoP_oOAJw_RfGyyWfAiRatS6KHKHkOyeYNba1KTXdB_70qdy1weZphB6JLRG8GFuN22hEnKaXOCZpwzSqpG6VM0Y7KmhOqqOkeLUnZ0kmRKaDlDd5uIh3fAEELf9RAHDN_QjUOfIk4BuzRGDx6_2JxI28ePPWTs-zLY2EG5QGfB7gss_n2O3h4fXpdPZL1ZPS_v16Tjsh6IksqroJxU2loPwTNBeaidqLkK0GklnaPaC9k01FfaOu8F48LRIL2z3Is5ujrmHnL6GqEMZpfGHKdKw5qmmtYoJibq-kh1OZWSIZhD7j9t_jGMmr9_zLY1x3_EL6NxVsY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1994005613</pqid></control><display><type>article</type><title>On the efficient execution of bounded Jaro-Winkler distances</title><source>EZB-FREE-00999 freely available EZB journals</source><creator>Dreßler, Kevin ; Ngonga Ngomo, Axel-Cyrille</creator><contributor>Cheatham, Michelle ; Pesquita, Catia ; Cruz, Isabel F. ; Euzenat, Jérôme ; Cruz, Isabel F. ; Pesquita, Catia ; Euzenat, Jérôme ; Cheatham, Michelle</contributor><creatorcontrib>Dreßler, Kevin ; Ngonga Ngomo, Axel-Cyrille ; Cheatham, Michelle ; Pesquita, Catia ; Cruz, Isabel F. ; Euzenat, Jérôme ; Cruz, Isabel F. ; Pesquita, Catia ; Euzenat, Jérôme ; Cheatham, Michelle</creatorcontrib><description>Over the last years, time-efficient approaches for the discovery of links between knowledge bases have been regarded as a key requirement towards implementing the idea of a Data Web. Thus, efficient and effective measures for comparing the labels of resources are central to facilitate the discovery of links between datasets on the Web of Data as well as their integration and fusion. We present a novel time-efficient implementation of filters that allow for the efficient execution of bounded Jaro-Winkler measures. We evaluate our approach on several datasets derived from DBpedia 3.9 and LinkedGeoData and containing up to 10 6 strings and show that it scales linearly with the size of the data for large thresholds. Moreover, we also show that our approach can be easily implemented in parallel. We also evaluate our approach against SILK and show that we outperform it even on small datasets.</description><identifier>ISSN: 1570-0844</identifier><identifier>EISSN: 2210-4968</identifier><identifier>DOI: 10.3233/SW-150209</identifier><language>eng</language><publisher>Amsterdam: IOS Press BV</publisher><subject>Datasets ; Knowledge bases (artificial intelligence) ; Silk ; Strings</subject><ispartof>Semantic Web, 2017-01, Vol.8 (2), p.185-196</ispartof><rights>Copyright IOS Press BV 2016</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c257t-656d6f6b568aadefd1302f7b3726fec865bb08d35990d48abdd3123b0f5dba2d3</citedby><cites>FETCH-LOGICAL-c257t-656d6f6b568aadefd1302f7b3726fec865bb08d35990d48abdd3123b0f5dba2d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27922,27923</link.rule.ids></links><search><contributor>Cheatham, Michelle</contributor><contributor>Pesquita, Catia</contributor><contributor>Cruz, Isabel F.</contributor><contributor>Euzenat, Jérôme</contributor><contributor>Cruz, Isabel F.</contributor><contributor>Pesquita, Catia</contributor><contributor>Euzenat, Jérôme</contributor><contributor>Cheatham, Michelle</contributor><creatorcontrib>Dreßler, Kevin</creatorcontrib><creatorcontrib>Ngonga Ngomo, Axel-Cyrille</creatorcontrib><title>On the efficient execution of bounded Jaro-Winkler distances</title><title>Semantic Web</title><description>Over the last years, time-efficient approaches for the discovery of links between knowledge bases have been regarded as a key requirement towards implementing the idea of a Data Web. Thus, efficient and effective measures for comparing the labels of resources are central to facilitate the discovery of links between datasets on the Web of Data as well as their integration and fusion. We present a novel time-efficient implementation of filters that allow for the efficient execution of bounded Jaro-Winkler measures. We evaluate our approach on several datasets derived from DBpedia 3.9 and LinkedGeoData and containing up to 10 6 strings and show that it scales linearly with the size of the data for large thresholds. Moreover, we also show that our approach can be easily implemented in parallel. We also evaluate our approach against SILK and show that we outperform it even on small datasets.</description><subject>Datasets</subject><subject>Knowledge bases (artificial intelligence)</subject><subject>Silk</subject><subject>Strings</subject><issn>1570-0844</issn><issn>2210-4968</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><recordid>eNotkE1LAzEYhIMoWGoP_oOAJw_RfGyyWfAiRatS6KHKHkOyeYNba1KTXdB_70qdy1weZphB6JLRG8GFuN22hEnKaXOCZpwzSqpG6VM0Y7KmhOqqOkeLUnZ0kmRKaDlDd5uIh3fAEELf9RAHDN_QjUOfIk4BuzRGDx6_2JxI28ePPWTs-zLY2EG5QGfB7gss_n2O3h4fXpdPZL1ZPS_v16Tjsh6IksqroJxU2loPwTNBeaidqLkK0GklnaPaC9k01FfaOu8F48LRIL2z3Is5ujrmHnL6GqEMZpfGHKdKw5qmmtYoJibq-kh1OZWSIZhD7j9t_jGMmr9_zLY1x3_EL6NxVsY</recordid><startdate>20170101</startdate><enddate>20170101</enddate><creator>Dreßler, Kevin</creator><creator>Ngonga Ngomo, Axel-Cyrille</creator><general>IOS Press BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20170101</creationdate><title>On the efficient execution of bounded Jaro-Winkler distances</title><author>Dreßler, Kevin ; Ngonga Ngomo, Axel-Cyrille</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c257t-656d6f6b568aadefd1302f7b3726fec865bb08d35990d48abdd3123b0f5dba2d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Datasets</topic><topic>Knowledge bases (artificial intelligence)</topic><topic>Silk</topic><topic>Strings</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dreßler, Kevin</creatorcontrib><creatorcontrib>Ngonga Ngomo, Axel-Cyrille</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Semantic Web</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dreßler, Kevin</au><au>Ngonga Ngomo, Axel-Cyrille</au><au>Cheatham, Michelle</au><au>Pesquita, Catia</au><au>Cruz, Isabel F.</au><au>Euzenat, Jérôme</au><au>Cruz, Isabel F.</au><au>Pesquita, Catia</au><au>Euzenat, Jérôme</au><au>Cheatham, Michelle</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On the efficient execution of bounded Jaro-Winkler distances</atitle><jtitle>Semantic Web</jtitle><date>2017-01-01</date><risdate>2017</risdate><volume>8</volume><issue>2</issue><spage>185</spage><epage>196</epage><pages>185-196</pages><issn>1570-0844</issn><eissn>2210-4968</eissn><abstract>Over the last years, time-efficient approaches for the discovery of links between knowledge bases have been regarded as a key requirement towards implementing the idea of a Data Web. Thus, efficient and effective measures for comparing the labels of resources are central to facilitate the discovery of links between datasets on the Web of Data as well as their integration and fusion. We present a novel time-efficient implementation of filters that allow for the efficient execution of bounded Jaro-Winkler measures. We evaluate our approach on several datasets derived from DBpedia 3.9 and LinkedGeoData and containing up to 10 6 strings and show that it scales linearly with the size of the data for large thresholds. Moreover, we also show that our approach can be easily implemented in parallel. We also evaluate our approach against SILK and show that we outperform it even on small datasets.</abstract><cop>Amsterdam</cop><pub>IOS Press BV</pub><doi>10.3233/SW-150209</doi><tpages>12</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1570-0844
ispartof Semantic Web, 2017-01, Vol.8 (2), p.185-196
issn 1570-0844
2210-4968
language eng
recordid cdi_proquest_journals_1994005613
source EZB-FREE-00999 freely available EZB journals
subjects Datasets
Knowledge bases (artificial intelligence)
Silk
Strings
title On the efficient execution of bounded Jaro-Winkler distances
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T07%3A51%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20the%20efficient%20execution%20of%20bounded%20Jaro-Winkler%20distances&rft.jtitle=Semantic%20Web&rft.au=Dre%C3%9Fler,%20Kevin&rft.date=2017-01-01&rft.volume=8&rft.issue=2&rft.spage=185&rft.epage=196&rft.pages=185-196&rft.issn=1570-0844&rft.eissn=2210-4968&rft_id=info:doi/10.3233/SW-150209&rft_dat=%3Cproquest_cross%3E1994005613%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1994005613&rft_id=info:pmid/&rfr_iscdi=true