Top-k Similarity Join in Heterogeneous Information Networks

As a newly emerging network model, heterogeneous information networks (HINs) have received growing attention. Many data mining tasks have been explored in HINs, including clustering, classification, and similarity search. Similarity join is a fundamental operation required for many problems. It is a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on knowledge and data engineering 2015-06, Vol.27 (6), p.1710-1723
Hauptverfasser: Yun Xiong, Yangyong Zhu, Yu, Philip S.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1723
container_issue 6
container_start_page 1710
container_title IEEE transactions on knowledge and data engineering
container_volume 27
creator Yun Xiong
Yangyong Zhu
Yu, Philip S.
description As a newly emerging network model, heterogeneous information networks (HINs) have received growing attention. Many data mining tasks have been explored in HINs, including clustering, classification, and similarity search. Similarity join is a fundamental operation required for many problems. It is attracting attention from various applications on network data, such as friend recommendation, link prediction, and online advertising. Although similarity join has been well studied in homogeneous networks, it has not yet been studied in heterogeneous networks. Especially, none of the existing research on similarity join takes different semantic meanings behind paths into consideration and almost all completely ignore the heterogeneity and diversity of the HINs. In this paper, we propose a path-based similarity join (PS-join) method to return the top k similar pairs of objects based on any user specified join path in a heterogeneous information network. We study how to prune expensive similarity computation by introducing bucket pruning based locality sensitive hashing (BPLSH) indexing. Compared with existing Link-based Similarity join (LS-join) method, PS-join can derive various similarity semantics. Experimental results on real data sets show the efficiency and effectiveness of the proposed approach.
doi_str_mv 10.1109/TKDE.2014.2373385
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_1762072386</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6963491</ieee_id><sourcerecordid>3671971161</sourcerecordid><originalsourceid>FETCH-LOGICAL-c392t-254c981b76e79f2fe7484c8d176254b6b75bcf7a15e5af571c80f43abaa197593</originalsourceid><addsrcrecordid>eNpdkE9LAzEQxYMoWKsfQLwsePGyNbP5jyep1apFD9ZzyK6JbLvd1GSL9NubtcWDMDAD83uPx0PoHPAIAKvr-fPdZFRgoKOCCEIkO0ADYEzmBSg4TDemkFNCxTE6iXGBMZZCwgDdzP06X2Zv9apuTKi7bfbk6zZLM7WdDf7TttZvYvbYOh9Wpqt9m73Y7tuHZTxFR8400Z7t9xC930_m42k-e314HN_O8oqoossLRisloRTcCuUKZwWVtJIfIHh6lbwUrKycMMAsM44JqCR2lJjSGFCCKTJEVzvfdfBfGxs7vapjZZvG_GbTvREWBZE8oZf_0IXfhDal08BF4jiTIlGwo6rgYwzW6XWoVyZsNWDd16n7OnVfp97XmTQXO01trf3jueKEKiA_OltveQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1677626587</pqid></control><display><type>article</type><title>Top-k Similarity Join in Heterogeneous Information Networks</title><source>IEEE Electronic Library (IEL)</source><creator>Yun Xiong ; Yangyong Zhu ; Yu, Philip S.</creator><creatorcontrib>Yun Xiong ; Yangyong Zhu ; Yu, Philip S.</creatorcontrib><description>As a newly emerging network model, heterogeneous information networks (HINs) have received growing attention. Many data mining tasks have been explored in HINs, including clustering, classification, and similarity search. Similarity join is a fundamental operation required for many problems. It is attracting attention from various applications on network data, such as friend recommendation, link prediction, and online advertising. Although similarity join has been well studied in homogeneous networks, it has not yet been studied in heterogeneous networks. Especially, none of the existing research on similarity join takes different semantic meanings behind paths into consideration and almost all completely ignore the heterogeneity and diversity of the HINs. In this paper, we propose a path-based similarity join (PS-join) method to return the top k similar pairs of objects based on any user specified join path in a heterogeneous information network. We study how to prune expensive similarity computation by introducing bucket pruning based locality sensitive hashing (BPLSH) indexing. Compared with existing Link-based Similarity join (LS-join) method, PS-join can derive various similarity semantics. Experimental results on real data sets show the efficiency and effectiveness of the proposed approach.</description><identifier>ISSN: 1041-4347</identifier><identifier>EISSN: 1558-2191</identifier><identifier>DOI: 10.1109/TKDE.2014.2373385</identifier><identifier>CODEN: ITKEEH</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Data engineering ; Data mining ; graph ; Heterogeneity ; heterogeneous network ; Indexing ; Knowledge engineering ; Links ; Mathematical models ; Networks ; Search problems ; Searching ; Semantics ; Similarity ; similarity join ; Vectors</subject><ispartof>IEEE transactions on knowledge and data engineering, 2015-06, Vol.27 (6), p.1710-1723</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Jun 2015</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c392t-254c981b76e79f2fe7484c8d176254b6b75bcf7a15e5af571c80f43abaa197593</citedby><cites>FETCH-LOGICAL-c392t-254c981b76e79f2fe7484c8d176254b6b75bcf7a15e5af571c80f43abaa197593</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6963491$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27903,27904,54736</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6963491$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Yun Xiong</creatorcontrib><creatorcontrib>Yangyong Zhu</creatorcontrib><creatorcontrib>Yu, Philip S.</creatorcontrib><title>Top-k Similarity Join in Heterogeneous Information Networks</title><title>IEEE transactions on knowledge and data engineering</title><addtitle>TKDE</addtitle><description>As a newly emerging network model, heterogeneous information networks (HINs) have received growing attention. Many data mining tasks have been explored in HINs, including clustering, classification, and similarity search. Similarity join is a fundamental operation required for many problems. It is attracting attention from various applications on network data, such as friend recommendation, link prediction, and online advertising. Although similarity join has been well studied in homogeneous networks, it has not yet been studied in heterogeneous networks. Especially, none of the existing research on similarity join takes different semantic meanings behind paths into consideration and almost all completely ignore the heterogeneity and diversity of the HINs. In this paper, we propose a path-based similarity join (PS-join) method to return the top k similar pairs of objects based on any user specified join path in a heterogeneous information network. We study how to prune expensive similarity computation by introducing bucket pruning based locality sensitive hashing (BPLSH) indexing. Compared with existing Link-based Similarity join (LS-join) method, PS-join can derive various similarity semantics. Experimental results on real data sets show the efficiency and effectiveness of the proposed approach.</description><subject>Data engineering</subject><subject>Data mining</subject><subject>graph</subject><subject>Heterogeneity</subject><subject>heterogeneous network</subject><subject>Indexing</subject><subject>Knowledge engineering</subject><subject>Links</subject><subject>Mathematical models</subject><subject>Networks</subject><subject>Search problems</subject><subject>Searching</subject><subject>Semantics</subject><subject>Similarity</subject><subject>similarity join</subject><subject>Vectors</subject><issn>1041-4347</issn><issn>1558-2191</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkE9LAzEQxYMoWKsfQLwsePGyNbP5jyep1apFD9ZzyK6JbLvd1GSL9NubtcWDMDAD83uPx0PoHPAIAKvr-fPdZFRgoKOCCEIkO0ADYEzmBSg4TDemkFNCxTE6iXGBMZZCwgDdzP06X2Zv9apuTKi7bfbk6zZLM7WdDf7TttZvYvbYOh9Wpqt9m73Y7tuHZTxFR8400Z7t9xC930_m42k-e314HN_O8oqoossLRisloRTcCuUKZwWVtJIfIHh6lbwUrKycMMAsM44JqCR2lJjSGFCCKTJEVzvfdfBfGxs7vapjZZvG_GbTvREWBZE8oZf_0IXfhDal08BF4jiTIlGwo6rgYwzW6XWoVyZsNWDd16n7OnVfp97XmTQXO01trf3jueKEKiA_OltveQ</recordid><startdate>20150601</startdate><enddate>20150601</enddate><creator>Yun Xiong</creator><creator>Yangyong Zhu</creator><creator>Yu, Philip S.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope><scope>FR3</scope></search><sort><creationdate>20150601</creationdate><title>Top-k Similarity Join in Heterogeneous Information Networks</title><author>Yun Xiong ; Yangyong Zhu ; Yu, Philip S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c392t-254c981b76e79f2fe7484c8d176254b6b75bcf7a15e5af571c80f43abaa197593</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Data engineering</topic><topic>Data mining</topic><topic>graph</topic><topic>Heterogeneity</topic><topic>heterogeneous network</topic><topic>Indexing</topic><topic>Knowledge engineering</topic><topic>Links</topic><topic>Mathematical models</topic><topic>Networks</topic><topic>Search problems</topic><topic>Searching</topic><topic>Semantics</topic><topic>Similarity</topic><topic>similarity join</topic><topic>Vectors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yun Xiong</creatorcontrib><creatorcontrib>Yangyong Zhu</creatorcontrib><creatorcontrib>Yu, Philip S.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><jtitle>IEEE transactions on knowledge and data engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yun Xiong</au><au>Yangyong Zhu</au><au>Yu, Philip S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Top-k Similarity Join in Heterogeneous Information Networks</atitle><jtitle>IEEE transactions on knowledge and data engineering</jtitle><stitle>TKDE</stitle><date>2015-06-01</date><risdate>2015</risdate><volume>27</volume><issue>6</issue><spage>1710</spage><epage>1723</epage><pages>1710-1723</pages><issn>1041-4347</issn><eissn>1558-2191</eissn><coden>ITKEEH</coden><abstract>As a newly emerging network model, heterogeneous information networks (HINs) have received growing attention. Many data mining tasks have been explored in HINs, including clustering, classification, and similarity search. Similarity join is a fundamental operation required for many problems. It is attracting attention from various applications on network data, such as friend recommendation, link prediction, and online advertising. Although similarity join has been well studied in homogeneous networks, it has not yet been studied in heterogeneous networks. Especially, none of the existing research on similarity join takes different semantic meanings behind paths into consideration and almost all completely ignore the heterogeneity and diversity of the HINs. In this paper, we propose a path-based similarity join (PS-join) method to return the top k similar pairs of objects based on any user specified join path in a heterogeneous information network. We study how to prune expensive similarity computation by introducing bucket pruning based locality sensitive hashing (BPLSH) indexing. Compared with existing Link-based Similarity join (LS-join) method, PS-join can derive various similarity semantics. Experimental results on real data sets show the efficiency and effectiveness of the proposed approach.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TKDE.2014.2373385</doi><tpages>14</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1041-4347
ispartof IEEE transactions on knowledge and data engineering, 2015-06, Vol.27 (6), p.1710-1723
issn 1041-4347
1558-2191
language eng
recordid cdi_proquest_miscellaneous_1762072386
source IEEE Electronic Library (IEL)
subjects Data engineering
Data mining
graph
Heterogeneity
heterogeneous network
Indexing
Knowledge engineering
Links
Mathematical models
Networks
Search problems
Searching
Semantics
Similarity
similarity join
Vectors
title Top-k Similarity Join in Heterogeneous Information Networks
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T14%3A00%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Top-k%20Similarity%20Join%20in%20Heterogeneous%20Information%20Networks&rft.jtitle=IEEE%20transactions%20on%20knowledge%20and%20data%20engineering&rft.au=Yun%20Xiong&rft.date=2015-06-01&rft.volume=27&rft.issue=6&rft.spage=1710&rft.epage=1723&rft.pages=1710-1723&rft.issn=1041-4347&rft.eissn=1558-2191&rft.coden=ITKEEH&rft_id=info:doi/10.1109/TKDE.2014.2373385&rft_dat=%3Cproquest_RIE%3E3671971161%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1677626587&rft_id=info:pmid/&rft_ieee_id=6963491&rfr_iscdi=true