SERIMI: Class-Based Matching for Instance Matching Across Heterogeneous Datasets

State-of-the-art instance matching approaches do not perform well when used for matching instances across heterogeneous datasets. This shortcoming derives from their core operation depending on direct matching, which involves a direct comparison of instances in the source with instances in the targe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on knowledge and data engineering 2015-05, Vol.27 (5), p.1397-1440
Hauptverfasser: Araujo, Samur, Duc Thanh Tran, de Vries, Arjen P., Schwabe, Daniel
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1440
container_issue 5
container_start_page 1397
container_title IEEE transactions on knowledge and data engineering
container_volume 27
creator Araujo, Samur
Duc Thanh Tran
de Vries, Arjen P.
Schwabe, Daniel
description State-of-the-art instance matching approaches do not perform well when used for matching instances across heterogeneous datasets. This shortcoming derives from their core operation depending on direct matching, which involves a direct comparison of instances in the source with instances in the target dataset. Direct matching is not suitable when the overlap between the datasets is small. Aiming at resolving this problem, we propose a new paradigm called class-based matching. Given a class of instances from the source dataset, called the class of interest, and a set of candidate matches retrieved from the target, class-based matching refines the candidates by filtering out those that do not belong to the class of interest. For this refinement, only data in the target is used, i.e., no direct comparison between source and target is involved. Based on extensive experiments using public benchmarks, we show our approach greatly improves the quality of state-of-the-art systems; especially on difficult matching tasks.
doi_str_mv 10.1109/TKDE.2014.2365779
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_1669416708</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6940278</ieee_id><sourcerecordid>3643671331</sourcerecordid><originalsourceid>FETCH-LOGICAL-c369t-7554b30556a273b647000f6de933e9be14a5dcc6391f680eb6dd3355fbe6d4e43</originalsourceid><addsrcrecordid>eNpdkM1Lw0AQxRdRsFb_APES8OIldSf7lXirbbXBFkXredkkk5qSJrqbHvzv3X6g4GmG4fcebx4hl0AHADS5XTyNJ4OIAh9ETAqlkiPSAyHiMIIEjv1OOYSccXVKzpxbUUpjFUOPvLxNXtN5eheMauNceG8cFsHcdPlH1SyDsrVB2rjONDn-XYe5bZ0LptihbZfYYLtxwdh0Xtu5c3JSmtrhxWH2yfvDZDGahrPnx3Q0nIU5k0kXKiF4xqgQ0kSKZZIrH6mUBSaMYZIhcCOKPJcsgVLGFDNZFIwJUWYoC46c9cnN3vfTtl8bdJ1eVy7Huja7PBqUjKhSccQ8ev0PXbUb2_h0GqRMOEhFY0_Bntp9Z7HUn7ZaG_utgeptx3rbsd52rA8de83VXlMh4i_vLWmkYvYD1zx2Ag</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1669416708</pqid></control><display><type>article</type><title>SERIMI: Class-Based Matching for Instance Matching Across Heterogeneous Datasets</title><source>IEEE Electronic Library (IEL)</source><creator>Araujo, Samur ; Duc Thanh Tran ; de Vries, Arjen P. ; Schwabe, Daniel</creator><creatorcontrib>Araujo, Samur ; Duc Thanh Tran ; de Vries, Arjen P. ; Schwabe, Daniel</creatorcontrib><description>State-of-the-art instance matching approaches do not perform well when used for matching instances across heterogeneous datasets. This shortcoming derives from their core operation depending on direct matching, which involves a direct comparison of instances in the source with instances in the target dataset. Direct matching is not suitable when the overlap between the datasets is small. Aiming at resolving this problem, we propose a new paradigm called class-based matching. Given a class of instances from the source dataset, called the class of interest, and a set of candidate matches retrieved from the target, class-based matching refines the candidates by filtering out those that do not belong to the class of interest. For this refinement, only data in the target is used, i.e., no direct comparison between source and target is involved. Based on extensive experiments using public benchmarks, we show our approach greatly improves the quality of state-of-the-art systems; especially on difficult matching tasks.</description><identifier>ISSN: 1041-4347</identifier><identifier>EISSN: 1558-2191</identifier><identifier>DOI: 10.1109/TKDE.2014.2365779</identifier><identifier>CODEN: ITKEEH</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Approximation methods ; Benchmark testing ; Benchmarks ; Class-Based matching ; Complexity theory ; Data integration ; Data models ; Direct matching ; Filtering ; Filtration ; Instance matching ; Matching ; Resource description framework ; Semantic Web ; Semantics ; State of the art ; Tasks</subject><ispartof>IEEE transactions on knowledge and data engineering, 2015-05, Vol.27 (5), p.1397-1440</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) May 2015</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c369t-7554b30556a273b647000f6de933e9be14a5dcc6391f680eb6dd3355fbe6d4e43</citedby><cites>FETCH-LOGICAL-c369t-7554b30556a273b647000f6de933e9be14a5dcc6391f680eb6dd3355fbe6d4e43</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6940278$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6940278$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Araujo, Samur</creatorcontrib><creatorcontrib>Duc Thanh Tran</creatorcontrib><creatorcontrib>de Vries, Arjen P.</creatorcontrib><creatorcontrib>Schwabe, Daniel</creatorcontrib><title>SERIMI: Class-Based Matching for Instance Matching Across Heterogeneous Datasets</title><title>IEEE transactions on knowledge and data engineering</title><addtitle>TKDE</addtitle><description>State-of-the-art instance matching approaches do not perform well when used for matching instances across heterogeneous datasets. This shortcoming derives from their core operation depending on direct matching, which involves a direct comparison of instances in the source with instances in the target dataset. Direct matching is not suitable when the overlap between the datasets is small. Aiming at resolving this problem, we propose a new paradigm called class-based matching. Given a class of instances from the source dataset, called the class of interest, and a set of candidate matches retrieved from the target, class-based matching refines the candidates by filtering out those that do not belong to the class of interest. For this refinement, only data in the target is used, i.e., no direct comparison between source and target is involved. Based on extensive experiments using public benchmarks, we show our approach greatly improves the quality of state-of-the-art systems; especially on difficult matching tasks.</description><subject>Approximation methods</subject><subject>Benchmark testing</subject><subject>Benchmarks</subject><subject>Class-Based matching</subject><subject>Complexity theory</subject><subject>Data integration</subject><subject>Data models</subject><subject>Direct matching</subject><subject>Filtering</subject><subject>Filtration</subject><subject>Instance matching</subject><subject>Matching</subject><subject>Resource description framework</subject><subject>Semantic Web</subject><subject>Semantics</subject><subject>State of the art</subject><subject>Tasks</subject><issn>1041-4347</issn><issn>1558-2191</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkM1Lw0AQxRdRsFb_APES8OIldSf7lXirbbXBFkXredkkk5qSJrqbHvzv3X6g4GmG4fcebx4hl0AHADS5XTyNJ4OIAh9ETAqlkiPSAyHiMIIEjv1OOYSccXVKzpxbUUpjFUOPvLxNXtN5eheMauNceG8cFsHcdPlH1SyDsrVB2rjONDn-XYe5bZ0LptihbZfYYLtxwdh0Xtu5c3JSmtrhxWH2yfvDZDGahrPnx3Q0nIU5k0kXKiF4xqgQ0kSKZZIrH6mUBSaMYZIhcCOKPJcsgVLGFDNZFIwJUWYoC46c9cnN3vfTtl8bdJ1eVy7Huja7PBqUjKhSccQ8ev0PXbUb2_h0GqRMOEhFY0_Bntp9Z7HUn7ZaG_utgeptx3rbsd52rA8de83VXlMh4i_vLWmkYvYD1zx2Ag</recordid><startdate>20150501</startdate><enddate>20150501</enddate><creator>Araujo, Samur</creator><creator>Duc Thanh Tran</creator><creator>de Vries, Arjen P.</creator><creator>Schwabe, Daniel</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope><scope>FR3</scope></search><sort><creationdate>20150501</creationdate><title>SERIMI: Class-Based Matching for Instance Matching Across Heterogeneous Datasets</title><author>Araujo, Samur ; Duc Thanh Tran ; de Vries, Arjen P. ; Schwabe, Daniel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c369t-7554b30556a273b647000f6de933e9be14a5dcc6391f680eb6dd3355fbe6d4e43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Approximation methods</topic><topic>Benchmark testing</topic><topic>Benchmarks</topic><topic>Class-Based matching</topic><topic>Complexity theory</topic><topic>Data integration</topic><topic>Data models</topic><topic>Direct matching</topic><topic>Filtering</topic><topic>Filtration</topic><topic>Instance matching</topic><topic>Matching</topic><topic>Resource description framework</topic><topic>Semantic Web</topic><topic>Semantics</topic><topic>State of the art</topic><topic>Tasks</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Araujo, Samur</creatorcontrib><creatorcontrib>Duc Thanh Tran</creatorcontrib><creatorcontrib>de Vries, Arjen P.</creatorcontrib><creatorcontrib>Schwabe, Daniel</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><jtitle>IEEE transactions on knowledge and data engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Araujo, Samur</au><au>Duc Thanh Tran</au><au>de Vries, Arjen P.</au><au>Schwabe, Daniel</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SERIMI: Class-Based Matching for Instance Matching Across Heterogeneous Datasets</atitle><jtitle>IEEE transactions on knowledge and data engineering</jtitle><stitle>TKDE</stitle><date>2015-05-01</date><risdate>2015</risdate><volume>27</volume><issue>5</issue><spage>1397</spage><epage>1440</epage><pages>1397-1440</pages><issn>1041-4347</issn><eissn>1558-2191</eissn><coden>ITKEEH</coden><abstract>State-of-the-art instance matching approaches do not perform well when used for matching instances across heterogeneous datasets. This shortcoming derives from their core operation depending on direct matching, which involves a direct comparison of instances in the source with instances in the target dataset. Direct matching is not suitable when the overlap between the datasets is small. Aiming at resolving this problem, we propose a new paradigm called class-based matching. Given a class of instances from the source dataset, called the class of interest, and a set of candidate matches retrieved from the target, class-based matching refines the candidates by filtering out those that do not belong to the class of interest. For this refinement, only data in the target is used, i.e., no direct comparison between source and target is involved. Based on extensive experiments using public benchmarks, we show our approach greatly improves the quality of state-of-the-art systems; especially on difficult matching tasks.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TKDE.2014.2365779</doi><tpages>44</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1041-4347
ispartof IEEE transactions on knowledge and data engineering, 2015-05, Vol.27 (5), p.1397-1440
issn 1041-4347
1558-2191
language eng
recordid cdi_proquest_journals_1669416708
source IEEE Electronic Library (IEL)
subjects Approximation methods
Benchmark testing
Benchmarks
Class-Based matching
Complexity theory
Data integration
Data models
Direct matching
Filtering
Filtration
Instance matching
Matching
Resource description framework
Semantic Web
Semantics
State of the art
Tasks
title SERIMI: Class-Based Matching for Instance Matching Across Heterogeneous Datasets
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T02%3A37%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SERIMI:%20Class-Based%20Matching%20for%20Instance%20Matching%20Across%20Heterogeneous%20Datasets&rft.jtitle=IEEE%20transactions%20on%20knowledge%20and%20data%20engineering&rft.au=Araujo,%20Samur&rft.date=2015-05-01&rft.volume=27&rft.issue=5&rft.spage=1397&rft.epage=1440&rft.pages=1397-1440&rft.issn=1041-4347&rft.eissn=1558-2191&rft.coden=ITKEEH&rft_id=info:doi/10.1109/TKDE.2014.2365779&rft_dat=%3Cproquest_RIE%3E3643671331%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1669416708&rft_id=info:pmid/&rft_ieee_id=6940278&rfr_iscdi=true