Top-k Similarity Join in Heterogeneous Information Networks

As a newly emerging network model, heterogeneous information networks (HINs) have received growing attention. Many data mining tasks have been explored in HINs, including clustering, classification, and similarity search. Similarity join is a fundamental operation required for many problems. It is a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on knowledge and data engineering 2015-06, Vol.27 (6), p.1710-1723
Hauptverfasser:	Yun Xiong, Yangyong Zhu, Yu, Philip S.
Format:	Artikel
Sprache:	eng
Schlagworte:	Data engineering Data mining graph Heterogeneity heterogeneous network Indexing Knowledge engineering Links Mathematical models Networks Search problems Searching Semantics Similarity similarity join Vectors
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1723
container_issue	6
container_start_page	1710
container_title	IEEE transactions on knowledge and data engineering
container_volume	27
creator	Yun Xiong Yangyong Zhu Yu, Philip S.
description	As a newly emerging network model, heterogeneous information networks (HINs) have received growing attention. Many data mining tasks have been explored in HINs, including clustering, classification, and similarity search. Similarity join is a fundamental operation required for many problems. It is attracting attention from various applications on network data, such as friend recommendation, link prediction, and online advertising. Although similarity join has been well studied in homogeneous networks, it has not yet been studied in heterogeneous networks. Especially, none of the existing research on similarity join takes different semantic meanings behind paths into consideration and almost all completely ignore the heterogeneity and diversity of the HINs. In this paper, we propose a path-based similarity join (PS-join) method to return the top k similar pairs of objects based on any user specified join path in a heterogeneous information network. We study how to prune expensive similarity computation by introducing bucket pruning based locality sensitive hashing (BPLSH) indexing. Compared with existing Link-based Similarity join (LS-join) method, PS-join can derive various similarity semantics. Experimental results on real data sets show the efficiency and effectiveness of the proposed approach.
doi_str_mv	10.1109/TKDE.2014.2373385
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_1762072386</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6963491</ieee_id><sourcerecordid>3671971161</sourcerecordid><originalsourceid>FETCH-LOGICAL-c392t-254c981b76e79f2fe7484c8d176254b6b75bcf7a15e5af571c80f43abaa197593</originalsourceid><addsrcrecordid>eNpdkE9LAzEQxYMoWKsfQLwsePGyNbP5jyep1apFD9ZzyK6JbLvd1GSL9NubtcWDMDAD83uPx0PoHPAIAKvr-fPdZFRgoKOCCEIkO0ADYEzmBSg4TDemkFNCxTE6iXGBMZZCwgDdzP06X2Zv9apuTKi7bfbk6zZLM7WdDf7TttZvYvbYOh9Wpqt9m73Y7tuHZTxFR8400Z7t9xC930_m42k-e314HN_O8oqoossLRisloRTcCuUKZwWVtJIfIHh6lbwUrKycMMAsM44JqCR2lJjSGFCCKTJEVzvfdfBfGxs7vapjZZvG_GbTvREWBZE8oZf_0IXfhDal08BF4jiTIlGwo6rgYwzW6XWoVyZsNWDd16n7OnVfp97XmTQXO01trf3jueKEKiA_OltveQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1677626587</pqid></control><display><type>article</type><title>Top-k Similarity Join in Heterogeneous Information Networks</title><source>IEEE Electronic Library (IEL)</source><creator>Yun Xiong ; Yangyong Zhu ; Yu, Philip S.</creator><creatorcontrib>Yun Xiong ; Yangyong Zhu ; Yu, Philip S.</creatorcontrib><description>As a newly emerging network model, heterogeneous information networks (HINs) have received growing attention. Many data mining tasks have been explored in HINs, including clustering, classification, and similarity search. Similarity join is a fundamental operation required for many problems. It is attracting attention from various applications on network data, such as friend recommendation, link prediction, and online advertising. Although similarity join has been well studied in homogeneous networks, it has not yet been studied in heterogeneous networks. Especially, none of the existing research on similarity join takes different semantic meanings behind paths into consideration and almost all completely ignore the heterogeneity and diversity of the HINs. In this paper, we propose a path-based similarity join (PS-join) method to return the top k similar pairs of objects based on any user specified join path in a heterogeneous information network. We study how to prune expensive similarity computation by introducing bucket pruning based locality sensitive hashing (BPLSH) indexing. Compared with existing Link-based Similarity join (LS-join) method, PS-join can derive various similarity semantics. Experimental results on real data sets show the efficiency and effectiveness of the proposed approach.</description><identifier>ISSN: 1041-4347</identifier><identifier>EISSN: 1558-2191</identifier><identifier>DOI: 10.1109/TKDE.2014.2373385</identifier><identifier>CODEN: ITKEEH</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Data engineering ; Data mining ; graph ; Heterogeneity ; heterogeneous network ; Indexing ; Knowledge engineering ; Links ; Mathematical models ; Networks ; Search problems ; Searching ; Semantics ; Similarity ; similarity join ; Vectors</subject><ispartof>IEEE transactions on knowledge and data engineering, 2015-06, Vol.27 (6), p.1710-1723</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Jun 2015</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c392t-254c981b76e79f2fe7484c8d176254b6b75bcf7a15e5af571c80f43abaa197593</citedby><cites>FETCH-LOGICAL-c392t-254c981b76e79f2fe7484c8d176254b6b75bcf7a15e5af571c80f43abaa197593</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6963491$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27903,27904,54736</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6963491$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Yun Xiong</creatorcontrib><creatorcontrib>Yangyong Zhu</creatorcontrib><creatorcontrib>Yu, Philip S.</creatorcontrib><title>Top-k Similarity Join in Heterogeneous Information Networks</title><title>IEEE transactions on knowledge and data engineering</title><addtitle>TKDE</addtitle><description>As a newly emerging network model, heterogeneous information networks (HINs) have received growing attention. Many data mining tasks have been explored in HINs, including clustering, classification, and similarity search. Similarity join is a fundamental operation required for many problems. It is attracting attention from various applications on network data, such as friend recommendation, link prediction, and online advertising. Although similarity join has been well studied in homogeneous networks, it has not yet been studied in heterogeneous networks. Especially, none of the existing research on similarity join takes different semantic meanings behind paths into consideration and almost all completely ignore the heterogeneity and diversity of the HINs. In this paper, we propose a path-based similarity join (PS-join) method to return the top k similar pairs of objects based on any user specified join path in a heterogeneous information network. We study how to prune expensive similarity computation by introducing bucket pruning based locality sensitive hashing (BPLSH) indexing. Compared with existing Link-based Similarity join (LS-join) method, PS-join can derive various similarity semantics. Experimental results on real data sets show the efficiency and effectiveness of the proposed approach.</description><subject>Data engineering</subject><subject>Data mining</subject><subject>graph</subject><subject>Heterogeneity</subject><subject>heterogeneous network</subject><subject>Indexing</subject><subject>Knowledge engineering</subject><subject>Links</subject><subject>Mathematical models</subject><subject>Networks</subject><subject>Search problems</subject><subject>Searching</subject><subject>Semantics</subject><subject>Similarity</subject><subject>similarity join</subject><subject>Vectors</subject><issn>1041-4347</issn><issn>1558-2191</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkE9LAzEQxYMoWKsfQLwsePGyNbP5jyep1apFD9ZzyK6JbLvd1GSL9NubtcWDMDAD83uPx0PoHPAIAKvr-fPdZFRgoKOCCEIkO0ADYEzmBSg4TDemkFNCxTE6iXGBMZZCwgDdzP06X2Zv9apuTKi7bfbk6zZLM7WdDf7TttZvYvbYOh9Wpqt9m73Y7tuHZTxFR8400Z7t9xC930_m42k-e314HN_O8oqoossLRisloRTcCuUKZwWVtJIfIHh6lbwUrKycMMAsM44JqCR2lJjSGFCCKTJEVzvfdfBfGxs7vapjZZvG_GbTvREWBZE8oZf_0IXfhDal08BF4jiTIlGwo6rgYwzW6XWoVyZsNWDd16n7OnVfp97XmTQXO01trf3jueKEKiA_OltveQ</recordid><startdate>20150601</startdate><enddate>20150601</enddate><creator>Yun Xiong</creator><creator>Yangyong Zhu</creator><creator>Yu, Philip S.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope><scope>FR3</scope></search><sort><creationdate>20150601</creationdate><title>Top-k Similarity Join in Heterogeneous Information Networks</title><author>Yun Xiong ; Yangyong Zhu ; Yu, Philip S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c392t-254c981b76e79f2fe7484c8d176254b6b75bcf7a15e5af571c80f43abaa197593</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Data engineering</topic><topic>Data mining</topic><topic>graph</topic><topic>Heterogeneity</topic><topic>heterogeneous network</topic><topic>Indexing</topic><topic>Knowledge engineering</topic><topic>Links</topic><topic>Mathematical models</topic><topic>Networks</topic><topic>Search problems</topic><topic>Searching</topic><topic>Semantics</topic><topic>Similarity</topic><topic>similarity join</topic><topic>Vectors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yun Xiong</creatorcontrib><creatorcontrib>Yangyong Zhu</creatorcontrib><creatorcontrib>Yu, Philip S.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><jtitle>IEEE transactions on knowledge and data engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yun Xiong</au><au>Yangyong Zhu</au><au>Yu, Philip S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Top-k Similarity Join in Heterogeneous Information Networks</atitle><jtitle>IEEE transactions on knowledge and data engineering</jtitle><stitle>TKDE</stitle><date>2015-06-01</date><risdate>2015</risdate><volume>27</volume><issue>6</issue><spage>1710</spage><epage>1723</epage><pages>1710-1723</pages><issn>1041-4347</issn><eissn>1558-2191</eissn><coden>ITKEEH</coden><abstract>As a newly emerging network model, heterogeneous information networks (HINs) have received growing attention. Many data mining tasks have been explored in HINs, including clustering, classification, and similarity search. Similarity join is a fundamental operation required for many problems. It is attracting attention from various applications on network data, such as friend recommendation, link prediction, and online advertising. Although similarity join has been well studied in homogeneous networks, it has not yet been studied in heterogeneous networks. Especially, none of the existing research on similarity join takes different semantic meanings behind paths into consideration and almost all completely ignore the heterogeneity and diversity of the HINs. In this paper, we propose a path-based similarity join (PS-join) method to return the top k similar pairs of objects based on any user specified join path in a heterogeneous information network. We study how to prune expensive similarity computation by introducing bucket pruning based locality sensitive hashing (BPLSH) indexing. Compared with existing Link-based Similarity join (LS-join) method, PS-join can derive various similarity semantics. Experimental results on real data sets show the efficiency and effectiveness of the proposed approach.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TKDE.2014.2373385</doi><tpages>14</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1041-4347
ispartof	IEEE transactions on knowledge and data engineering, 2015-06, Vol.27 (6), p.1710-1723
issn	1041-4347 1558-2191
language	eng
recordid	cdi_proquest_miscellaneous_1762072386
source	IEEE Electronic Library (IEL)
subjects	Data engineering Data mining graph Heterogeneity heterogeneous network Indexing Knowledge engineering Links Mathematical models Networks Search problems Searching Semantics Similarity similarity join Vectors
title	Top-k Similarity Join in Heterogeneous Information Networks
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T14%3A00%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Top-k%20Similarity%20Join%20in%20Heterogeneous%20Information%20Networks&rft.jtitle=IEEE%20transactions%20on%20knowledge%20and%20data%20engineering&rft.au=Yun%20Xiong&rft.date=2015-06-01&rft.volume=27&rft.issue=6&rft.spage=1710&rft.epage=1723&rft.pages=1710-1723&rft.issn=1041-4347&rft.eissn=1558-2191&rft.coden=ITKEEH&rft_id=info:doi/10.1109/TKDE.2014.2373385&rft_dat=%3Cproquest_RIE%3E3671971161%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1677626587&rft_id=info:pmid/&rft_ieee_id=6963491&rfr_iscdi=true