On the efficient execution of bounded Jaro-Winkler distances
Over the last years, time-efficient approaches for the discovery of links between knowledge bases have been regarded as a key requirement towards implementing the idea of a Data Web. Thus, efficient and effective measures for comparing the labels of resources are central to facilitate the discovery...
Gespeichert in:
Veröffentlicht in: | Semantic Web 2017-01, Vol.8 (2), p.185-196 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 196 |
---|---|
container_issue | 2 |
container_start_page | 185 |
container_title | Semantic Web |
container_volume | 8 |
creator | Dreßler, Kevin Ngonga Ngomo, Axel-Cyrille |
description | Over the last years, time-efficient approaches for the discovery of links between knowledge bases have been regarded as a key requirement towards implementing the idea of a Data Web. Thus, efficient and effective measures for comparing the labels of resources are central to facilitate the discovery of links between datasets on the Web of Data as well as their integration and fusion. We present a novel time-efficient implementation of filters that allow for the efficient execution of bounded Jaro-Winkler measures. We evaluate our approach on several datasets derived from DBpedia 3.9 and LinkedGeoData and containing up to 10 6 strings and show that it scales linearly with the size of the data for large thresholds. Moreover, we also show that our approach can be easily implemented in parallel. We also evaluate our approach against SILK and show that we outperform it even on small datasets. |
doi_str_mv | 10.3233/SW-150209 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_1994005613</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1994005613</sourcerecordid><originalsourceid>FETCH-LOGICAL-c257t-656d6f6b568aadefd1302f7b3726fec865bb08d35990d48abdd3123b0f5dba2d3</originalsourceid><addsrcrecordid>eNotkE1LAzEYhIMoWGoP_oOAJw_RfGyyWfAiRatS6KHKHkOyeYNba1KTXdB_70qdy1weZphB6JLRG8GFuN22hEnKaXOCZpwzSqpG6VM0Y7KmhOqqOkeLUnZ0kmRKaDlDd5uIh3fAEELf9RAHDN_QjUOfIk4BuzRGDx6_2JxI28ePPWTs-zLY2EG5QGfB7gss_n2O3h4fXpdPZL1ZPS_v16Tjsh6IksqroJxU2loPwTNBeaidqLkK0GklnaPaC9k01FfaOu8F48LRIL2z3Is5ujrmHnL6GqEMZpfGHKdKw5qmmtYoJibq-kh1OZWSIZhD7j9t_jGMmr9_zLY1x3_EL6NxVsY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1994005613</pqid></control><display><type>article</type><title>On the efficient execution of bounded Jaro-Winkler distances</title><source>EZB-FREE-00999 freely available EZB journals</source><creator>Dreßler, Kevin ; Ngonga Ngomo, Axel-Cyrille</creator><contributor>Cheatham, Michelle ; Pesquita, Catia ; Cruz, Isabel F. ; Euzenat, Jérôme ; Cruz, Isabel F. ; Pesquita, Catia ; Euzenat, Jérôme ; Cheatham, Michelle</contributor><creatorcontrib>Dreßler, Kevin ; Ngonga Ngomo, Axel-Cyrille ; Cheatham, Michelle ; Pesquita, Catia ; Cruz, Isabel F. ; Euzenat, Jérôme ; Cruz, Isabel F. ; Pesquita, Catia ; Euzenat, Jérôme ; Cheatham, Michelle</creatorcontrib><description>Over the last years, time-efficient approaches for the discovery of links between knowledge bases have been regarded as a key requirement towards implementing the idea of a Data Web. Thus, efficient and effective measures for comparing the labels of resources are central to facilitate the discovery of links between datasets on the Web of Data as well as their integration and fusion. We present a novel time-efficient implementation of filters that allow for the efficient execution of bounded Jaro-Winkler measures. We evaluate our approach on several datasets derived from DBpedia 3.9 and LinkedGeoData and containing up to 10 6 strings and show that it scales linearly with the size of the data for large thresholds. Moreover, we also show that our approach can be easily implemented in parallel. We also evaluate our approach against SILK and show that we outperform it even on small datasets.</description><identifier>ISSN: 1570-0844</identifier><identifier>EISSN: 2210-4968</identifier><identifier>DOI: 10.3233/SW-150209</identifier><language>eng</language><publisher>Amsterdam: IOS Press BV</publisher><subject>Datasets ; Knowledge bases (artificial intelligence) ; Silk ; Strings</subject><ispartof>Semantic Web, 2017-01, Vol.8 (2), p.185-196</ispartof><rights>Copyright IOS Press BV 2016</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c257t-656d6f6b568aadefd1302f7b3726fec865bb08d35990d48abdd3123b0f5dba2d3</citedby><cites>FETCH-LOGICAL-c257t-656d6f6b568aadefd1302f7b3726fec865bb08d35990d48abdd3123b0f5dba2d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27922,27923</link.rule.ids></links><search><contributor>Cheatham, Michelle</contributor><contributor>Pesquita, Catia</contributor><contributor>Cruz, Isabel F.</contributor><contributor>Euzenat, Jérôme</contributor><contributor>Cruz, Isabel F.</contributor><contributor>Pesquita, Catia</contributor><contributor>Euzenat, Jérôme</contributor><contributor>Cheatham, Michelle</contributor><creatorcontrib>Dreßler, Kevin</creatorcontrib><creatorcontrib>Ngonga Ngomo, Axel-Cyrille</creatorcontrib><title>On the efficient execution of bounded Jaro-Winkler distances</title><title>Semantic Web</title><description>Over the last years, time-efficient approaches for the discovery of links between knowledge bases have been regarded as a key requirement towards implementing the idea of a Data Web. Thus, efficient and effective measures for comparing the labels of resources are central to facilitate the discovery of links between datasets on the Web of Data as well as their integration and fusion. We present a novel time-efficient implementation of filters that allow for the efficient execution of bounded Jaro-Winkler measures. We evaluate our approach on several datasets derived from DBpedia 3.9 and LinkedGeoData and containing up to 10 6 strings and show that it scales linearly with the size of the data for large thresholds. Moreover, we also show that our approach can be easily implemented in parallel. We also evaluate our approach against SILK and show that we outperform it even on small datasets.</description><subject>Datasets</subject><subject>Knowledge bases (artificial intelligence)</subject><subject>Silk</subject><subject>Strings</subject><issn>1570-0844</issn><issn>2210-4968</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><recordid>eNotkE1LAzEYhIMoWGoP_oOAJw_RfGyyWfAiRatS6KHKHkOyeYNba1KTXdB_70qdy1weZphB6JLRG8GFuN22hEnKaXOCZpwzSqpG6VM0Y7KmhOqqOkeLUnZ0kmRKaDlDd5uIh3fAEELf9RAHDN_QjUOfIk4BuzRGDx6_2JxI28ePPWTs-zLY2EG5QGfB7gss_n2O3h4fXpdPZL1ZPS_v16Tjsh6IksqroJxU2loPwTNBeaidqLkK0GklnaPaC9k01FfaOu8F48LRIL2z3Is5ujrmHnL6GqEMZpfGHKdKw5qmmtYoJibq-kh1OZWSIZhD7j9t_jGMmr9_zLY1x3_EL6NxVsY</recordid><startdate>20170101</startdate><enddate>20170101</enddate><creator>Dreßler, Kevin</creator><creator>Ngonga Ngomo, Axel-Cyrille</creator><general>IOS Press BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20170101</creationdate><title>On the efficient execution of bounded Jaro-Winkler distances</title><author>Dreßler, Kevin ; Ngonga Ngomo, Axel-Cyrille</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c257t-656d6f6b568aadefd1302f7b3726fec865bb08d35990d48abdd3123b0f5dba2d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Datasets</topic><topic>Knowledge bases (artificial intelligence)</topic><topic>Silk</topic><topic>Strings</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dreßler, Kevin</creatorcontrib><creatorcontrib>Ngonga Ngomo, Axel-Cyrille</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Semantic Web</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dreßler, Kevin</au><au>Ngonga Ngomo, Axel-Cyrille</au><au>Cheatham, Michelle</au><au>Pesquita, Catia</au><au>Cruz, Isabel F.</au><au>Euzenat, Jérôme</au><au>Cruz, Isabel F.</au><au>Pesquita, Catia</au><au>Euzenat, Jérôme</au><au>Cheatham, Michelle</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On the efficient execution of bounded Jaro-Winkler distances</atitle><jtitle>Semantic Web</jtitle><date>2017-01-01</date><risdate>2017</risdate><volume>8</volume><issue>2</issue><spage>185</spage><epage>196</epage><pages>185-196</pages><issn>1570-0844</issn><eissn>2210-4968</eissn><abstract>Over the last years, time-efficient approaches for the discovery of links between knowledge bases have been regarded as a key requirement towards implementing the idea of a Data Web. Thus, efficient and effective measures for comparing the labels of resources are central to facilitate the discovery of links between datasets on the Web of Data as well as their integration and fusion. We present a novel time-efficient implementation of filters that allow for the efficient execution of bounded Jaro-Winkler measures. We evaluate our approach on several datasets derived from DBpedia 3.9 and LinkedGeoData and containing up to 10 6 strings and show that it scales linearly with the size of the data for large thresholds. Moreover, we also show that our approach can be easily implemented in parallel. We also evaluate our approach against SILK and show that we outperform it even on small datasets.</abstract><cop>Amsterdam</cop><pub>IOS Press BV</pub><doi>10.3233/SW-150209</doi><tpages>12</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1570-0844 |
ispartof | Semantic Web, 2017-01, Vol.8 (2), p.185-196 |
issn | 1570-0844 2210-4968 |
language | eng |
recordid | cdi_proquest_journals_1994005613 |
source | EZB-FREE-00999 freely available EZB journals |
subjects | Datasets Knowledge bases (artificial intelligence) Silk Strings |
title | On the efficient execution of bounded Jaro-Winkler distances |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T07%3A51%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20the%20efficient%20execution%20of%20bounded%20Jaro-Winkler%20distances&rft.jtitle=Semantic%20Web&rft.au=Dre%C3%9Fler,%20Kevin&rft.date=2017-01-01&rft.volume=8&rft.issue=2&rft.spage=185&rft.epage=196&rft.pages=185-196&rft.issn=1570-0844&rft.eissn=2210-4968&rft_id=info:doi/10.3233/SW-150209&rft_dat=%3Cproquest_cross%3E1994005613%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1994005613&rft_id=info:pmid/&rfr_iscdi=true |