Graph-based reference table construction to facilitate entity matching

► We models reference table generation problem as a graph with affinity property. ► We propose a hierarchy clustering in entity matching to distinguish tokens. ► We develop a graph-based method of identifying synonyms to prove the accuracy of clustering. ► We develop pruning and partition techniques...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of systems and software 2013-06, Vol.86 (6), p.1679-1688
Hauptverfasser:	Wang, Fangda, Wang, Hongzhi, Li, Jianzhong, Gao, Hong
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Clusters Computer programs Computer science Construction Effectiveness studies Efficiency Entity matching Graph clustering Graph theory Graphs Iterative methods Matching Mathematical models Reference table Software Tables (data)
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1688
container_issue	6
container_start_page	1679
container_title	The Journal of systems and software
container_volume	86
creator	Wang, Fangda Wang, Hongzhi Li, Jianzhong Gao, Hong
description	► We models reference table generation problem as a graph with affinity property. ► We propose a hierarchy clustering in entity matching to distinguish tokens. ► We develop a graph-based method of identifying synonyms to prove the accuracy of clustering. ► We develop pruning and partition techniques to achieve high performance. ► We propose a novel method of token weight decision. Entity matching plays a crucial role in information integration among heterogeneous data sources, and numerous solutions have been developed. Entity resolution based on reference table has the benefits of high efficiency and being easy to update. In such kind of methods, the reference table is important for effective entity matching. In this paper, we focus on the construction of effective reference table by relying on co-occurring relationship between tokens to identify suitable entity names. To achieve high efficiency and accuracy, we first model data set as graph, and then cluster the vertices in the graph in two stages. Based on the connectivity between vertices, we also mine synonyms and get the expansive reference table. We develop an iterative system and conduct an experimental study using real data. Experimental results show that the method in this paper achieves both high accuracy and efficiency.
doi_str_mv	10.1016/j.jss.2013.02.026
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1365155130</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0164121213000484</els_id><sourcerecordid>2951088621</sourcerecordid><originalsourceid>FETCH-LOGICAL-c358t-d475fc53a2c2e05053b08a1d1619e61aa3e062aa4ac608622c1bb2b3069f1a1f3</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWD9-gLcFL162ziRNusWTiFWh4EXPYTY7a7Nsd2uSCv57U-rJgzAwc3jeYeYR4gphioDmtpt2MU4loJqCzGWOxASruSpRyupYTDIzyzPKU3EWYwcAcwlyIpZPgbbrsqbITRG45cCD4yJR3XPhxiGmsHPJj0ORxqIl53ufKHHBQ_Lpu9hQcms_fFyIk5b6yJe__Vy8Lx_fHp7L1evTy8P9qnRKV6lsZnPdOq1IOsmgQasaKsIGDS7YIJFiMJJoRs5AZaR0WNeyVmAWLRK26lzcHPZuw_i545jsxkfHfU8Dj7toURmNWqOCjF7_QbtxF4Z8XabUbCHnGkym8EC5MMaYBdht8BsK3xbB7s3azmazdm_Wgsy1z9wdMpw__fIcbHR-r63xgV2yzej_Sf8Ara2AIw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1334927506</pqid></control><display><type>article</type><title>Graph-based reference table construction to facilitate entity matching</title><source>Access via ScienceDirect (Elsevier)</source><creator>Wang, Fangda ; Wang, Hongzhi ; Li, Jianzhong ; Gao, Hong</creator><creatorcontrib>Wang, Fangda ; Wang, Hongzhi ; Li, Jianzhong ; Gao, Hong</creatorcontrib><description>► We models reference table generation problem as a graph with affinity property. ► We propose a hierarchy clustering in entity matching to distinguish tokens. ► We develop a graph-based method of identifying synonyms to prove the accuracy of clustering. ► We develop pruning and partition techniques to achieve high performance. ► We propose a novel method of token weight decision. Entity matching plays a crucial role in information integration among heterogeneous data sources, and numerous solutions have been developed. Entity resolution based on reference table has the benefits of high efficiency and being easy to update. In such kind of methods, the reference table is important for effective entity matching. In this paper, we focus on the construction of effective reference table by relying on co-occurring relationship between tokens to identify suitable entity names. To achieve high efficiency and accuracy, we first model data set as graph, and then cluster the vertices in the graph in two stages. Based on the connectivity between vertices, we also mine synonyms and get the expansive reference table. We develop an iterative system and conduct an experimental study using real data. Experimental results show that the method in this paper achieves both high accuracy and efficiency.</description><identifier>ISSN: 0164-1212</identifier><identifier>EISSN: 1873-1228</identifier><identifier>DOI: 10.1016/j.jss.2013.02.026</identifier><identifier>CODEN: JSSODM</identifier><language>eng</language><publisher>New York: Elsevier Inc</publisher><subject>Accuracy ; Clusters ; Computer programs ; Computer science ; Construction ; Effectiveness studies ; Efficiency ; Entity matching ; Graph clustering ; Graph theory ; Graphs ; Iterative methods ; Matching ; Mathematical models ; Reference table ; Software ; Tables (data)</subject><ispartof>The Journal of systems and software, 2013-06, Vol.86 (6), p.1679-1688</ispartof><rights>2013 Elsevier Inc.</rights><rights>Copyright Elsevier Sequoia S.A. Jun 2013</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c358t-d475fc53a2c2e05053b08a1d1619e61aa3e062aa4ac608622c1bb2b3069f1a1f3</citedby><cites>FETCH-LOGICAL-c358t-d475fc53a2c2e05053b08a1d1619e61aa3e062aa4ac608622c1bb2b3069f1a1f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.jss.2013.02.026$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Wang, Fangda</creatorcontrib><creatorcontrib>Wang, Hongzhi</creatorcontrib><creatorcontrib>Li, Jianzhong</creatorcontrib><creatorcontrib>Gao, Hong</creatorcontrib><title>Graph-based reference table construction to facilitate entity matching</title><title>The Journal of systems and software</title><description>► We models reference table generation problem as a graph with affinity property. ► We propose a hierarchy clustering in entity matching to distinguish tokens. ► We develop a graph-based method of identifying synonyms to prove the accuracy of clustering. ► We develop pruning and partition techniques to achieve high performance. ► We propose a novel method of token weight decision. Entity matching plays a crucial role in information integration among heterogeneous data sources, and numerous solutions have been developed. Entity resolution based on reference table has the benefits of high efficiency and being easy to update. In such kind of methods, the reference table is important for effective entity matching. In this paper, we focus on the construction of effective reference table by relying on co-occurring relationship between tokens to identify suitable entity names. To achieve high efficiency and accuracy, we first model data set as graph, and then cluster the vertices in the graph in two stages. Based on the connectivity between vertices, we also mine synonyms and get the expansive reference table. We develop an iterative system and conduct an experimental study using real data. Experimental results show that the method in this paper achieves both high accuracy and efficiency.</description><subject>Accuracy</subject><subject>Clusters</subject><subject>Computer programs</subject><subject>Computer science</subject><subject>Construction</subject><subject>Effectiveness studies</subject><subject>Efficiency</subject><subject>Entity matching</subject><subject>Graph clustering</subject><subject>Graph theory</subject><subject>Graphs</subject><subject>Iterative methods</subject><subject>Matching</subject><subject>Mathematical models</subject><subject>Reference table</subject><subject>Software</subject><subject>Tables (data)</subject><issn>0164-1212</issn><issn>1873-1228</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LAzEQhoMoWD9-gLcFL162ziRNusWTiFWh4EXPYTY7a7Nsd2uSCv57U-rJgzAwc3jeYeYR4gphioDmtpt2MU4loJqCzGWOxASruSpRyupYTDIzyzPKU3EWYwcAcwlyIpZPgbbrsqbITRG45cCD4yJR3XPhxiGmsHPJj0ORxqIl53ufKHHBQ_Lpu9hQcms_fFyIk5b6yJe__Vy8Lx_fHp7L1evTy8P9qnRKV6lsZnPdOq1IOsmgQasaKsIGDS7YIJFiMJJoRs5AZaR0WNeyVmAWLRK26lzcHPZuw_i545jsxkfHfU8Dj7toURmNWqOCjF7_QbtxF4Z8XabUbCHnGkym8EC5MMaYBdht8BsK3xbB7s3azmazdm_Wgsy1z9wdMpw__fIcbHR-r63xgV2yzej_Sf8Ara2AIw</recordid><startdate>201306</startdate><enddate>201306</enddate><creator>Wang, Fangda</creator><creator>Wang, Hongzhi</creator><creator>Li, Jianzhong</creator><creator>Gao, Hong</creator><general>Elsevier Inc</general><general>Elsevier Sequoia S.A</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>201306</creationdate><title>Graph-based reference table construction to facilitate entity matching</title><author>Wang, Fangda ; Wang, Hongzhi ; Li, Jianzhong ; Gao, Hong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c358t-d475fc53a2c2e05053b08a1d1619e61aa3e062aa4ac608622c1bb2b3069f1a1f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Accuracy</topic><topic>Clusters</topic><topic>Computer programs</topic><topic>Computer science</topic><topic>Construction</topic><topic>Effectiveness studies</topic><topic>Efficiency</topic><topic>Entity matching</topic><topic>Graph clustering</topic><topic>Graph theory</topic><topic>Graphs</topic><topic>Iterative methods</topic><topic>Matching</topic><topic>Mathematical models</topic><topic>Reference table</topic><topic>Software</topic><topic>Tables (data)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Fangda</creatorcontrib><creatorcontrib>Wang, Hongzhi</creatorcontrib><creatorcontrib>Li, Jianzhong</creatorcontrib><creatorcontrib>Gao, Hong</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>The Journal of systems and software</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Fangda</au><au>Wang, Hongzhi</au><au>Li, Jianzhong</au><au>Gao, Hong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Graph-based reference table construction to facilitate entity matching</atitle><jtitle>The Journal of systems and software</jtitle><date>2013-06</date><risdate>2013</risdate><volume>86</volume><issue>6</issue><spage>1679</spage><epage>1688</epage><pages>1679-1688</pages><issn>0164-1212</issn><eissn>1873-1228</eissn><coden>JSSODM</coden><abstract>► We models reference table generation problem as a graph with affinity property. ► We propose a hierarchy clustering in entity matching to distinguish tokens. ► We develop a graph-based method of identifying synonyms to prove the accuracy of clustering. ► We develop pruning and partition techniques to achieve high performance. ► We propose a novel method of token weight decision. Entity matching plays a crucial role in information integration among heterogeneous data sources, and numerous solutions have been developed. Entity resolution based on reference table has the benefits of high efficiency and being easy to update. In such kind of methods, the reference table is important for effective entity matching. In this paper, we focus on the construction of effective reference table by relying on co-occurring relationship between tokens to identify suitable entity names. To achieve high efficiency and accuracy, we first model data set as graph, and then cluster the vertices in the graph in two stages. Based on the connectivity between vertices, we also mine synonyms and get the expansive reference table. We develop an iterative system and conduct an experimental study using real data. Experimental results show that the method in this paper achieves both high accuracy and efficiency.</abstract><cop>New York</cop><pub>Elsevier Inc</pub><doi>10.1016/j.jss.2013.02.026</doi><tpages>10</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0164-1212
ispartof	The Journal of systems and software, 2013-06, Vol.86 (6), p.1679-1688
issn	0164-1212 1873-1228
language	eng
recordid	cdi_proquest_miscellaneous_1365155130
source	Access via ScienceDirect (Elsevier)
subjects	Accuracy Clusters Computer programs Computer science Construction Effectiveness studies Efficiency Entity matching Graph clustering Graph theory Graphs Iterative methods Matching Mathematical models Reference table Software Tables (data)
title	Graph-based reference table construction to facilitate entity matching
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T01%3A28%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Graph-based%20reference%20table%20construction%20to%20facilitate%20entity%20matching&rft.jtitle=The%20Journal%20of%20systems%20and%20software&rft.au=Wang,%20Fangda&rft.date=2013-06&rft.volume=86&rft.issue=6&rft.spage=1679&rft.epage=1688&rft.pages=1679-1688&rft.issn=0164-1212&rft.eissn=1873-1228&rft.coden=JSSODM&rft_id=info:doi/10.1016/j.jss.2013.02.026&rft_dat=%3Cproquest_cross%3E2951088621%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1334927506&rft_id=info:pmid/&rft_els_id=S0164121213000484&rfr_iscdi=true