SERIMI: Class-Based Matching for Instance Matching Across Heterogeneous Datasets
State-of-the-art instance matching approaches do not perform well when used for matching instances across heterogeneous datasets. This shortcoming derives from their core operation depending on direct matching, which involves a direct comparison of instances in the source with instances in the targe...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on knowledge and data engineering 2015-05, Vol.27 (5), p.1397-1440 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1440 |
---|---|
container_issue | 5 |
container_start_page | 1397 |
container_title | IEEE transactions on knowledge and data engineering |
container_volume | 27 |
creator | Araujo, Samur Duc Thanh Tran de Vries, Arjen P. Schwabe, Daniel |
description | State-of-the-art instance matching approaches do not perform well when used for matching instances across heterogeneous datasets. This shortcoming derives from their core operation depending on direct matching, which involves a direct comparison of instances in the source with instances in the target dataset. Direct matching is not suitable when the overlap between the datasets is small. Aiming at resolving this problem, we propose a new paradigm called class-based matching. Given a class of instances from the source dataset, called the class of interest, and a set of candidate matches retrieved from the target, class-based matching refines the candidates by filtering out those that do not belong to the class of interest. For this refinement, only data in the target is used, i.e., no direct comparison between source and target is involved. Based on extensive experiments using public benchmarks, we show our approach greatly improves the quality of state-of-the-art systems; especially on difficult matching tasks. |
doi_str_mv | 10.1109/TKDE.2014.2365779 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_1669416708</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6940278</ieee_id><sourcerecordid>3643671331</sourcerecordid><originalsourceid>FETCH-LOGICAL-c369t-7554b30556a273b647000f6de933e9be14a5dcc6391f680eb6dd3355fbe6d4e43</originalsourceid><addsrcrecordid>eNpdkM1Lw0AQxRdRsFb_APES8OIldSf7lXirbbXBFkXredkkk5qSJrqbHvzv3X6g4GmG4fcebx4hl0AHADS5XTyNJ4OIAh9ETAqlkiPSAyHiMIIEjv1OOYSccXVKzpxbUUpjFUOPvLxNXtN5eheMauNceG8cFsHcdPlH1SyDsrVB2rjONDn-XYe5bZ0LptihbZfYYLtxwdh0Xtu5c3JSmtrhxWH2yfvDZDGahrPnx3Q0nIU5k0kXKiF4xqgQ0kSKZZIrH6mUBSaMYZIhcCOKPJcsgVLGFDNZFIwJUWYoC46c9cnN3vfTtl8bdJ1eVy7Huja7PBqUjKhSccQ8ev0PXbUb2_h0GqRMOEhFY0_Bntp9Z7HUn7ZaG_utgeptx3rbsd52rA8de83VXlMh4i_vLWmkYvYD1zx2Ag</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1669416708</pqid></control><display><type>article</type><title>SERIMI: Class-Based Matching for Instance Matching Across Heterogeneous Datasets</title><source>IEEE Electronic Library (IEL)</source><creator>Araujo, Samur ; Duc Thanh Tran ; de Vries, Arjen P. ; Schwabe, Daniel</creator><creatorcontrib>Araujo, Samur ; Duc Thanh Tran ; de Vries, Arjen P. ; Schwabe, Daniel</creatorcontrib><description>State-of-the-art instance matching approaches do not perform well when used for matching instances across heterogeneous datasets. This shortcoming derives from their core operation depending on direct matching, which involves a direct comparison of instances in the source with instances in the target dataset. Direct matching is not suitable when the overlap between the datasets is small. Aiming at resolving this problem, we propose a new paradigm called class-based matching. Given a class of instances from the source dataset, called the class of interest, and a set of candidate matches retrieved from the target, class-based matching refines the candidates by filtering out those that do not belong to the class of interest. For this refinement, only data in the target is used, i.e., no direct comparison between source and target is involved. Based on extensive experiments using public benchmarks, we show our approach greatly improves the quality of state-of-the-art systems; especially on difficult matching tasks.</description><identifier>ISSN: 1041-4347</identifier><identifier>EISSN: 1558-2191</identifier><identifier>DOI: 10.1109/TKDE.2014.2365779</identifier><identifier>CODEN: ITKEEH</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Approximation methods ; Benchmark testing ; Benchmarks ; Class-Based matching ; Complexity theory ; Data integration ; Data models ; Direct matching ; Filtering ; Filtration ; Instance matching ; Matching ; Resource description framework ; Semantic Web ; Semantics ; State of the art ; Tasks</subject><ispartof>IEEE transactions on knowledge and data engineering, 2015-05, Vol.27 (5), p.1397-1440</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) May 2015</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c369t-7554b30556a273b647000f6de933e9be14a5dcc6391f680eb6dd3355fbe6d4e43</citedby><cites>FETCH-LOGICAL-c369t-7554b30556a273b647000f6de933e9be14a5dcc6391f680eb6dd3355fbe6d4e43</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6940278$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6940278$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Araujo, Samur</creatorcontrib><creatorcontrib>Duc Thanh Tran</creatorcontrib><creatorcontrib>de Vries, Arjen P.</creatorcontrib><creatorcontrib>Schwabe, Daniel</creatorcontrib><title>SERIMI: Class-Based Matching for Instance Matching Across Heterogeneous Datasets</title><title>IEEE transactions on knowledge and data engineering</title><addtitle>TKDE</addtitle><description>State-of-the-art instance matching approaches do not perform well when used for matching instances across heterogeneous datasets. This shortcoming derives from their core operation depending on direct matching, which involves a direct comparison of instances in the source with instances in the target dataset. Direct matching is not suitable when the overlap between the datasets is small. Aiming at resolving this problem, we propose a new paradigm called class-based matching. Given a class of instances from the source dataset, called the class of interest, and a set of candidate matches retrieved from the target, class-based matching refines the candidates by filtering out those that do not belong to the class of interest. For this refinement, only data in the target is used, i.e., no direct comparison between source and target is involved. Based on extensive experiments using public benchmarks, we show our approach greatly improves the quality of state-of-the-art systems; especially on difficult matching tasks.</description><subject>Approximation methods</subject><subject>Benchmark testing</subject><subject>Benchmarks</subject><subject>Class-Based matching</subject><subject>Complexity theory</subject><subject>Data integration</subject><subject>Data models</subject><subject>Direct matching</subject><subject>Filtering</subject><subject>Filtration</subject><subject>Instance matching</subject><subject>Matching</subject><subject>Resource description framework</subject><subject>Semantic Web</subject><subject>Semantics</subject><subject>State of the art</subject><subject>Tasks</subject><issn>1041-4347</issn><issn>1558-2191</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkM1Lw0AQxRdRsFb_APES8OIldSf7lXirbbXBFkXredkkk5qSJrqbHvzv3X6g4GmG4fcebx4hl0AHADS5XTyNJ4OIAh9ETAqlkiPSAyHiMIIEjv1OOYSccXVKzpxbUUpjFUOPvLxNXtN5eheMauNceG8cFsHcdPlH1SyDsrVB2rjONDn-XYe5bZ0LptihbZfYYLtxwdh0Xtu5c3JSmtrhxWH2yfvDZDGahrPnx3Q0nIU5k0kXKiF4xqgQ0kSKZZIrH6mUBSaMYZIhcCOKPJcsgVLGFDNZFIwJUWYoC46c9cnN3vfTtl8bdJ1eVy7Huja7PBqUjKhSccQ8ev0PXbUb2_h0GqRMOEhFY0_Bntp9Z7HUn7ZaG_utgeptx3rbsd52rA8de83VXlMh4i_vLWmkYvYD1zx2Ag</recordid><startdate>20150501</startdate><enddate>20150501</enddate><creator>Araujo, Samur</creator><creator>Duc Thanh Tran</creator><creator>de Vries, Arjen P.</creator><creator>Schwabe, Daniel</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope><scope>FR3</scope></search><sort><creationdate>20150501</creationdate><title>SERIMI: Class-Based Matching for Instance Matching Across Heterogeneous Datasets</title><author>Araujo, Samur ; Duc Thanh Tran ; de Vries, Arjen P. ; Schwabe, Daniel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c369t-7554b30556a273b647000f6de933e9be14a5dcc6391f680eb6dd3355fbe6d4e43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Approximation methods</topic><topic>Benchmark testing</topic><topic>Benchmarks</topic><topic>Class-Based matching</topic><topic>Complexity theory</topic><topic>Data integration</topic><topic>Data models</topic><topic>Direct matching</topic><topic>Filtering</topic><topic>Filtration</topic><topic>Instance matching</topic><topic>Matching</topic><topic>Resource description framework</topic><topic>Semantic Web</topic><topic>Semantics</topic><topic>State of the art</topic><topic>Tasks</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Araujo, Samur</creatorcontrib><creatorcontrib>Duc Thanh Tran</creatorcontrib><creatorcontrib>de Vries, Arjen P.</creatorcontrib><creatorcontrib>Schwabe, Daniel</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><jtitle>IEEE transactions on knowledge and data engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Araujo, Samur</au><au>Duc Thanh Tran</au><au>de Vries, Arjen P.</au><au>Schwabe, Daniel</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SERIMI: Class-Based Matching for Instance Matching Across Heterogeneous Datasets</atitle><jtitle>IEEE transactions on knowledge and data engineering</jtitle><stitle>TKDE</stitle><date>2015-05-01</date><risdate>2015</risdate><volume>27</volume><issue>5</issue><spage>1397</spage><epage>1440</epage><pages>1397-1440</pages><issn>1041-4347</issn><eissn>1558-2191</eissn><coden>ITKEEH</coden><abstract>State-of-the-art instance matching approaches do not perform well when used for matching instances across heterogeneous datasets. This shortcoming derives from their core operation depending on direct matching, which involves a direct comparison of instances in the source with instances in the target dataset. Direct matching is not suitable when the overlap between the datasets is small. Aiming at resolving this problem, we propose a new paradigm called class-based matching. Given a class of instances from the source dataset, called the class of interest, and a set of candidate matches retrieved from the target, class-based matching refines the candidates by filtering out those that do not belong to the class of interest. For this refinement, only data in the target is used, i.e., no direct comparison between source and target is involved. Based on extensive experiments using public benchmarks, we show our approach greatly improves the quality of state-of-the-art systems; especially on difficult matching tasks.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TKDE.2014.2365779</doi><tpages>44</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1041-4347 |
ispartof | IEEE transactions on knowledge and data engineering, 2015-05, Vol.27 (5), p.1397-1440 |
issn | 1041-4347 1558-2191 |
language | eng |
recordid | cdi_proquest_journals_1669416708 |
source | IEEE Electronic Library (IEL) |
subjects | Approximation methods Benchmark testing Benchmarks Class-Based matching Complexity theory Data integration Data models Direct matching Filtering Filtration Instance matching Matching Resource description framework Semantic Web Semantics State of the art Tasks |
title | SERIMI: Class-Based Matching for Instance Matching Across Heterogeneous Datasets |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T02%3A37%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SERIMI:%20Class-Based%20Matching%20for%20Instance%20Matching%20Across%20Heterogeneous%20Datasets&rft.jtitle=IEEE%20transactions%20on%20knowledge%20and%20data%20engineering&rft.au=Araujo,%20Samur&rft.date=2015-05-01&rft.volume=27&rft.issue=5&rft.spage=1397&rft.epage=1440&rft.pages=1397-1440&rft.issn=1041-4347&rft.eissn=1558-2191&rft.coden=ITKEEH&rft_id=info:doi/10.1109/TKDE.2014.2365779&rft_dat=%3Cproquest_RIE%3E3643671331%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1669416708&rft_id=info:pmid/&rft_ieee_id=6940278&rfr_iscdi=true |