Top-k Similarity Join in Heterogeneous Information Networks
As a newly emerging network model, heterogeneous information networks (HINs) have received growing attention. Many data mining tasks have been explored in HINs, including clustering, classification, and similarity search. Similarity join is a fundamental operation required for many problems. It is a...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on knowledge and data engineering 2015-06, Vol.27 (6), p.1710-1723 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1723 |
---|---|
container_issue | 6 |
container_start_page | 1710 |
container_title | IEEE transactions on knowledge and data engineering |
container_volume | 27 |
creator | Yun Xiong Yangyong Zhu Yu, Philip S. |
description | As a newly emerging network model, heterogeneous information networks (HINs) have received growing attention. Many data mining tasks have been explored in HINs, including clustering, classification, and similarity search. Similarity join is a fundamental operation required for many problems. It is attracting attention from various applications on network data, such as friend recommendation, link prediction, and online advertising. Although similarity join has been well studied in homogeneous networks, it has not yet been studied in heterogeneous networks. Especially, none of the existing research on similarity join takes different semantic meanings behind paths into consideration and almost all completely ignore the heterogeneity and diversity of the HINs. In this paper, we propose a path-based similarity join (PS-join) method to return the top k similar pairs of objects based on any user specified join path in a heterogeneous information network. We study how to prune expensive similarity computation by introducing bucket pruning based locality sensitive hashing (BPLSH) indexing. Compared with existing Link-based Similarity join (LS-join) method, PS-join can derive various similarity semantics. Experimental results on real data sets show the efficiency and effectiveness of the proposed approach. |
doi_str_mv | 10.1109/TKDE.2014.2373385 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_1762072386</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6963491</ieee_id><sourcerecordid>3671971161</sourcerecordid><originalsourceid>FETCH-LOGICAL-c392t-254c981b76e79f2fe7484c8d176254b6b75bcf7a15e5af571c80f43abaa197593</originalsourceid><addsrcrecordid>eNpdkE9LAzEQxYMoWKsfQLwsePGyNbP5jyep1apFD9ZzyK6JbLvd1GSL9NubtcWDMDAD83uPx0PoHPAIAKvr-fPdZFRgoKOCCEIkO0ADYEzmBSg4TDemkFNCxTE6iXGBMZZCwgDdzP06X2Zv9apuTKi7bfbk6zZLM7WdDf7TttZvYvbYOh9Wpqt9m73Y7tuHZTxFR8400Z7t9xC930_m42k-e314HN_O8oqoossLRisloRTcCuUKZwWVtJIfIHh6lbwUrKycMMAsM44JqCR2lJjSGFCCKTJEVzvfdfBfGxs7vapjZZvG_GbTvREWBZE8oZf_0IXfhDal08BF4jiTIlGwo6rgYwzW6XWoVyZsNWDd16n7OnVfp97XmTQXO01trf3jueKEKiA_OltveQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1677626587</pqid></control><display><type>article</type><title>Top-k Similarity Join in Heterogeneous Information Networks</title><source>IEEE Electronic Library (IEL)</source><creator>Yun Xiong ; Yangyong Zhu ; Yu, Philip S.</creator><creatorcontrib>Yun Xiong ; Yangyong Zhu ; Yu, Philip S.</creatorcontrib><description>As a newly emerging network model, heterogeneous information networks (HINs) have received growing attention. Many data mining tasks have been explored in HINs, including clustering, classification, and similarity search. Similarity join is a fundamental operation required for many problems. It is attracting attention from various applications on network data, such as friend recommendation, link prediction, and online advertising. Although similarity join has been well studied in homogeneous networks, it has not yet been studied in heterogeneous networks. Especially, none of the existing research on similarity join takes different semantic meanings behind paths into consideration and almost all completely ignore the heterogeneity and diversity of the HINs. In this paper, we propose a path-based similarity join (PS-join) method to return the top k similar pairs of objects based on any user specified join path in a heterogeneous information network. We study how to prune expensive similarity computation by introducing bucket pruning based locality sensitive hashing (BPLSH) indexing. Compared with existing Link-based Similarity join (LS-join) method, PS-join can derive various similarity semantics. Experimental results on real data sets show the efficiency and effectiveness of the proposed approach.</description><identifier>ISSN: 1041-4347</identifier><identifier>EISSN: 1558-2191</identifier><identifier>DOI: 10.1109/TKDE.2014.2373385</identifier><identifier>CODEN: ITKEEH</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Data engineering ; Data mining ; graph ; Heterogeneity ; heterogeneous network ; Indexing ; Knowledge engineering ; Links ; Mathematical models ; Networks ; Search problems ; Searching ; Semantics ; Similarity ; similarity join ; Vectors</subject><ispartof>IEEE transactions on knowledge and data engineering, 2015-06, Vol.27 (6), p.1710-1723</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Jun 2015</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c392t-254c981b76e79f2fe7484c8d176254b6b75bcf7a15e5af571c80f43abaa197593</citedby><cites>FETCH-LOGICAL-c392t-254c981b76e79f2fe7484c8d176254b6b75bcf7a15e5af571c80f43abaa197593</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6963491$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27903,27904,54736</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6963491$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Yun Xiong</creatorcontrib><creatorcontrib>Yangyong Zhu</creatorcontrib><creatorcontrib>Yu, Philip S.</creatorcontrib><title>Top-k Similarity Join in Heterogeneous Information Networks</title><title>IEEE transactions on knowledge and data engineering</title><addtitle>TKDE</addtitle><description>As a newly emerging network model, heterogeneous information networks (HINs) have received growing attention. Many data mining tasks have been explored in HINs, including clustering, classification, and similarity search. Similarity join is a fundamental operation required for many problems. It is attracting attention from various applications on network data, such as friend recommendation, link prediction, and online advertising. Although similarity join has been well studied in homogeneous networks, it has not yet been studied in heterogeneous networks. Especially, none of the existing research on similarity join takes different semantic meanings behind paths into consideration and almost all completely ignore the heterogeneity and diversity of the HINs. In this paper, we propose a path-based similarity join (PS-join) method to return the top k similar pairs of objects based on any user specified join path in a heterogeneous information network. We study how to prune expensive similarity computation by introducing bucket pruning based locality sensitive hashing (BPLSH) indexing. Compared with existing Link-based Similarity join (LS-join) method, PS-join can derive various similarity semantics. Experimental results on real data sets show the efficiency and effectiveness of the proposed approach.</description><subject>Data engineering</subject><subject>Data mining</subject><subject>graph</subject><subject>Heterogeneity</subject><subject>heterogeneous network</subject><subject>Indexing</subject><subject>Knowledge engineering</subject><subject>Links</subject><subject>Mathematical models</subject><subject>Networks</subject><subject>Search problems</subject><subject>Searching</subject><subject>Semantics</subject><subject>Similarity</subject><subject>similarity join</subject><subject>Vectors</subject><issn>1041-4347</issn><issn>1558-2191</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkE9LAzEQxYMoWKsfQLwsePGyNbP5jyep1apFD9ZzyK6JbLvd1GSL9NubtcWDMDAD83uPx0PoHPAIAKvr-fPdZFRgoKOCCEIkO0ADYEzmBSg4TDemkFNCxTE6iXGBMZZCwgDdzP06X2Zv9apuTKi7bfbk6zZLM7WdDf7TttZvYvbYOh9Wpqt9m73Y7tuHZTxFR8400Z7t9xC930_m42k-e314HN_O8oqoossLRisloRTcCuUKZwWVtJIfIHh6lbwUrKycMMAsM44JqCR2lJjSGFCCKTJEVzvfdfBfGxs7vapjZZvG_GbTvREWBZE8oZf_0IXfhDal08BF4jiTIlGwo6rgYwzW6XWoVyZsNWDd16n7OnVfp97XmTQXO01trf3jueKEKiA_OltveQ</recordid><startdate>20150601</startdate><enddate>20150601</enddate><creator>Yun Xiong</creator><creator>Yangyong Zhu</creator><creator>Yu, Philip S.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope><scope>FR3</scope></search><sort><creationdate>20150601</creationdate><title>Top-k Similarity Join in Heterogeneous Information Networks</title><author>Yun Xiong ; Yangyong Zhu ; Yu, Philip S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c392t-254c981b76e79f2fe7484c8d176254b6b75bcf7a15e5af571c80f43abaa197593</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Data engineering</topic><topic>Data mining</topic><topic>graph</topic><topic>Heterogeneity</topic><topic>heterogeneous network</topic><topic>Indexing</topic><topic>Knowledge engineering</topic><topic>Links</topic><topic>Mathematical models</topic><topic>Networks</topic><topic>Search problems</topic><topic>Searching</topic><topic>Semantics</topic><topic>Similarity</topic><topic>similarity join</topic><topic>Vectors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yun Xiong</creatorcontrib><creatorcontrib>Yangyong Zhu</creatorcontrib><creatorcontrib>Yu, Philip S.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><jtitle>IEEE transactions on knowledge and data engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yun Xiong</au><au>Yangyong Zhu</au><au>Yu, Philip S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Top-k Similarity Join in Heterogeneous Information Networks</atitle><jtitle>IEEE transactions on knowledge and data engineering</jtitle><stitle>TKDE</stitle><date>2015-06-01</date><risdate>2015</risdate><volume>27</volume><issue>6</issue><spage>1710</spage><epage>1723</epage><pages>1710-1723</pages><issn>1041-4347</issn><eissn>1558-2191</eissn><coden>ITKEEH</coden><abstract>As a newly emerging network model, heterogeneous information networks (HINs) have received growing attention. Many data mining tasks have been explored in HINs, including clustering, classification, and similarity search. Similarity join is a fundamental operation required for many problems. It is attracting attention from various applications on network data, such as friend recommendation, link prediction, and online advertising. Although similarity join has been well studied in homogeneous networks, it has not yet been studied in heterogeneous networks. Especially, none of the existing research on similarity join takes different semantic meanings behind paths into consideration and almost all completely ignore the heterogeneity and diversity of the HINs. In this paper, we propose a path-based similarity join (PS-join) method to return the top k similar pairs of objects based on any user specified join path in a heterogeneous information network. We study how to prune expensive similarity computation by introducing bucket pruning based locality sensitive hashing (BPLSH) indexing. Compared with existing Link-based Similarity join (LS-join) method, PS-join can derive various similarity semantics. Experimental results on real data sets show the efficiency and effectiveness of the proposed approach.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TKDE.2014.2373385</doi><tpages>14</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1041-4347 |
ispartof | IEEE transactions on knowledge and data engineering, 2015-06, Vol.27 (6), p.1710-1723 |
issn | 1041-4347 1558-2191 |
language | eng |
recordid | cdi_proquest_miscellaneous_1762072386 |
source | IEEE Electronic Library (IEL) |
subjects | Data engineering Data mining graph Heterogeneity heterogeneous network Indexing Knowledge engineering Links Mathematical models Networks Search problems Searching Semantics Similarity similarity join Vectors |
title | Top-k Similarity Join in Heterogeneous Information Networks |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T14%3A00%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Top-k%20Similarity%20Join%20in%20Heterogeneous%20Information%20Networks&rft.jtitle=IEEE%20transactions%20on%20knowledge%20and%20data%20engineering&rft.au=Yun%20Xiong&rft.date=2015-06-01&rft.volume=27&rft.issue=6&rft.spage=1710&rft.epage=1723&rft.pages=1710-1723&rft.issn=1041-4347&rft.eissn=1558-2191&rft.coden=ITKEEH&rft_id=info:doi/10.1109/TKDE.2014.2373385&rft_dat=%3Cproquest_RIE%3E3671971161%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1677626587&rft_id=info:pmid/&rft_ieee_id=6963491&rfr_iscdi=true |