Efficient Correlation Search from Graph Databases

Correlation mining has gained great success in many application domains for its ability to capture the underlying dependency between objects. However, research on correlation mining from graph databases is still lacking despite the proliferation of graph data in recent years. We propose a new proble...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on knowledge and data engineering 2008-12, Vol.20 (12), p.1601-1615
Hauptverfasser: Yiping Ke, Cheng, J., Ng, W.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1615
container_issue 12
container_start_page 1601
container_title IEEE transactions on knowledge and data engineering
container_volume 20
creator Yiping Ke
Cheng, J.
Ng, W.
description Correlation mining has gained great success in many application domains for its ability to capture the underlying dependency between objects. However, research on correlation mining from graph databases is still lacking despite the proliferation of graph data in recent years. We propose a new problem of correlation mining from graph databases, called correlated graph search (CGS). CGS adopts Pearson's correlation coefficient to take into account the occurrence distributions of graphs. However, the problem poses significant challenges, since every subgraph of a graph in the database is a candidate but the number of subgraphs is exponential. We derive two necessary conditions that set bounds on the occurrence probability of a candidate in the database. With this result, we devise an efficient algorithm that mines the candidate set from a much smaller projected database and thus a significantly smaller set of candidates is obtained. Three heuristic rules are further developed to refine the candidate set. We also make use of the bounds to directly answer high-support queries without mining the candidates. Experimental results justify the efficiency of our algorithm. Finally, we generalize the CGS problem and show that our algorithm provides a general solution to most of the existing correlation measures.
doi_str_mv 10.1109/TKDE.2008.86
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_pascalfrancis_primary_20850808</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4515864</ieee_id><sourcerecordid>875067874</sourcerecordid><originalsourceid>FETCH-LOGICAL-c413t-a404ca89d78f2425bc3db30a93cbf0b3fca1f27acc38c839027dd9e988d70c743</originalsourceid><addsrcrecordid>eNp90D1PwzAQBmALgUQpbGwsERKwkOKvxOcRteVDVGKgzNbFsdVUaVLsdODfk9KqAwOTT_Jzr3QvIZeMjhij-mH-NpmOOKUwgvyIDFiWQcqZZsf9TCVLpZDqlJzFuKQ9UsAGhE29r2zlmi4ZtyG4GruqbZIPh8EuEh_aVfIccL1IJthhgdHFc3LisY7uYv8OyefTdD5-SWfvz6_jx1lqJRNdipJKi6BLBZ5LnhVWlIWgqIUtPC2Et8g8V2itAAtCU67KUjsNUCpqlRRDcrfLXYf2a-NiZ1ZVtK6usXHtJhpQGc0V_Mrbf6WQGTBQeQ-v_8BluwlNf4XRjHOlpRY9ut8hG9oYg_NmHaoVhm_DqNnWbLY1m23NBraZN_tMjBZrH7CxVTzscAoZBQq9u9q5yjl3-JYZyyCX4gf9z4ON</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>912279493</pqid></control><display><type>article</type><title>Efficient Correlation Search from Graph Databases</title><source>IEEE Electronic Library (IEL)</source><creator>Yiping Ke ; Cheng, J. ; Ng, W.</creator><creatorcontrib>Yiping Ke ; Cheng, J. ; Ng, W.</creatorcontrib><description>Correlation mining has gained great success in many application domains for its ability to capture the underlying dependency between objects. However, research on correlation mining from graph databases is still lacking despite the proliferation of graph data in recent years. We propose a new problem of correlation mining from graph databases, called correlated graph search (CGS). CGS adopts Pearson's correlation coefficient to take into account the occurrence distributions of graphs. However, the problem poses significant challenges, since every subgraph of a graph in the database is a candidate but the number of subgraphs is exponential. We derive two necessary conditions that set bounds on the occurrence probability of a candidate in the database. With this result, we devise an efficient algorithm that mines the candidate set from a much smaller projected database and thus a significantly smaller set of candidates is obtained. Three heuristic rules are further developed to refine the candidate set. We also make use of the bounds to directly answer high-support queries without mining the candidates. Experimental results justify the efficiency of our algorithm. Finally, we generalize the CGS problem and show that our algorithm provides a general solution to most of the existing correlation measures.</description><identifier>ISSN: 1041-4347</identifier><identifier>EISSN: 1558-2191</identifier><identifier>DOI: 10.1109/TKDE.2008.86</identifier><identifier>CODEN: ITKEEH</identifier><language>eng</language><publisher>New York, NY: IEEE</publisher><subject>Algorithms ; Applied sciences ; Chemicals ; Chemistry ; Computational biology ; Computer science; control theory; systems ; Correlation ; Data mining ; Data models ; Data processing. List processing. Character string processing ; Drugs ; Exact sciences and technology ; Graphs ; Information retrieval. Graph ; Memory organisation. Data processing ; Mines ; Mining ; Mining methods and algorithms ; Multimedia databases ; Queries ; Searching ; Software ; Streaming media ; Studies ; Theoretical computing ; Transaction databases ; XML</subject><ispartof>IEEE transactions on knowledge and data engineering, 2008-12, Vol.20 (12), p.1601-1615</ispartof><rights>2009 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2008</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c413t-a404ca89d78f2425bc3db30a93cbf0b3fca1f27acc38c839027dd9e988d70c743</citedby><cites>FETCH-LOGICAL-c413t-a404ca89d78f2425bc3db30a93cbf0b3fca1f27acc38c839027dd9e988d70c743</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4515864$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4515864$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=20850808$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Yiping Ke</creatorcontrib><creatorcontrib>Cheng, J.</creatorcontrib><creatorcontrib>Ng, W.</creatorcontrib><title>Efficient Correlation Search from Graph Databases</title><title>IEEE transactions on knowledge and data engineering</title><addtitle>TKDE</addtitle><description>Correlation mining has gained great success in many application domains for its ability to capture the underlying dependency between objects. However, research on correlation mining from graph databases is still lacking despite the proliferation of graph data in recent years. We propose a new problem of correlation mining from graph databases, called correlated graph search (CGS). CGS adopts Pearson's correlation coefficient to take into account the occurrence distributions of graphs. However, the problem poses significant challenges, since every subgraph of a graph in the database is a candidate but the number of subgraphs is exponential. We derive two necessary conditions that set bounds on the occurrence probability of a candidate in the database. With this result, we devise an efficient algorithm that mines the candidate set from a much smaller projected database and thus a significantly smaller set of candidates is obtained. Three heuristic rules are further developed to refine the candidate set. We also make use of the bounds to directly answer high-support queries without mining the candidates. Experimental results justify the efficiency of our algorithm. Finally, we generalize the CGS problem and show that our algorithm provides a general solution to most of the existing correlation measures.</description><subject>Algorithms</subject><subject>Applied sciences</subject><subject>Chemicals</subject><subject>Chemistry</subject><subject>Computational biology</subject><subject>Computer science; control theory; systems</subject><subject>Correlation</subject><subject>Data mining</subject><subject>Data models</subject><subject>Data processing. List processing. Character string processing</subject><subject>Drugs</subject><subject>Exact sciences and technology</subject><subject>Graphs</subject><subject>Information retrieval. Graph</subject><subject>Memory organisation. Data processing</subject><subject>Mines</subject><subject>Mining</subject><subject>Mining methods and algorithms</subject><subject>Multimedia databases</subject><subject>Queries</subject><subject>Searching</subject><subject>Software</subject><subject>Streaming media</subject><subject>Studies</subject><subject>Theoretical computing</subject><subject>Transaction databases</subject><subject>XML</subject><issn>1041-4347</issn><issn>1558-2191</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2008</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNp90D1PwzAQBmALgUQpbGwsERKwkOKvxOcRteVDVGKgzNbFsdVUaVLsdODfk9KqAwOTT_Jzr3QvIZeMjhij-mH-NpmOOKUwgvyIDFiWQcqZZsf9TCVLpZDqlJzFuKQ9UsAGhE29r2zlmi4ZtyG4GruqbZIPh8EuEh_aVfIccL1IJthhgdHFc3LisY7uYv8OyefTdD5-SWfvz6_jx1lqJRNdipJKi6BLBZ5LnhVWlIWgqIUtPC2Et8g8V2itAAtCU67KUjsNUCpqlRRDcrfLXYf2a-NiZ1ZVtK6usXHtJhpQGc0V_Mrbf6WQGTBQeQ-v_8BluwlNf4XRjHOlpRY9ut8hG9oYg_NmHaoVhm_DqNnWbLY1m23NBraZN_tMjBZrH7CxVTzscAoZBQq9u9q5yjl3-JYZyyCX4gf9z4ON</recordid><startdate>20081201</startdate><enddate>20081201</enddate><creator>Yiping Ke</creator><creator>Cheng, J.</creator><creator>Ng, W.</creator><general>IEEE</general><general>IEEE Computer Society</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope><scope>FR3</scope></search><sort><creationdate>20081201</creationdate><title>Efficient Correlation Search from Graph Databases</title><author>Yiping Ke ; Cheng, J. ; Ng, W.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c413t-a404ca89d78f2425bc3db30a93cbf0b3fca1f27acc38c839027dd9e988d70c743</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2008</creationdate><topic>Algorithms</topic><topic>Applied sciences</topic><topic>Chemicals</topic><topic>Chemistry</topic><topic>Computational biology</topic><topic>Computer science; control theory; systems</topic><topic>Correlation</topic><topic>Data mining</topic><topic>Data models</topic><topic>Data processing. List processing. Character string processing</topic><topic>Drugs</topic><topic>Exact sciences and technology</topic><topic>Graphs</topic><topic>Information retrieval. Graph</topic><topic>Memory organisation. Data processing</topic><topic>Mines</topic><topic>Mining</topic><topic>Mining methods and algorithms</topic><topic>Multimedia databases</topic><topic>Queries</topic><topic>Searching</topic><topic>Software</topic><topic>Streaming media</topic><topic>Studies</topic><topic>Theoretical computing</topic><topic>Transaction databases</topic><topic>XML</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yiping Ke</creatorcontrib><creatorcontrib>Cheng, J.</creatorcontrib><creatorcontrib>Ng, W.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><jtitle>IEEE transactions on knowledge and data engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yiping Ke</au><au>Cheng, J.</au><au>Ng, W.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Efficient Correlation Search from Graph Databases</atitle><jtitle>IEEE transactions on knowledge and data engineering</jtitle><stitle>TKDE</stitle><date>2008-12-01</date><risdate>2008</risdate><volume>20</volume><issue>12</issue><spage>1601</spage><epage>1615</epage><pages>1601-1615</pages><issn>1041-4347</issn><eissn>1558-2191</eissn><coden>ITKEEH</coden><abstract>Correlation mining has gained great success in many application domains for its ability to capture the underlying dependency between objects. However, research on correlation mining from graph databases is still lacking despite the proliferation of graph data in recent years. We propose a new problem of correlation mining from graph databases, called correlated graph search (CGS). CGS adopts Pearson's correlation coefficient to take into account the occurrence distributions of graphs. However, the problem poses significant challenges, since every subgraph of a graph in the database is a candidate but the number of subgraphs is exponential. We derive two necessary conditions that set bounds on the occurrence probability of a candidate in the database. With this result, we devise an efficient algorithm that mines the candidate set from a much smaller projected database and thus a significantly smaller set of candidates is obtained. Three heuristic rules are further developed to refine the candidate set. We also make use of the bounds to directly answer high-support queries without mining the candidates. Experimental results justify the efficiency of our algorithm. Finally, we generalize the CGS problem and show that our algorithm provides a general solution to most of the existing correlation measures.</abstract><cop>New York, NY</cop><pub>IEEE</pub><doi>10.1109/TKDE.2008.86</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1041-4347
ispartof IEEE transactions on knowledge and data engineering, 2008-12, Vol.20 (12), p.1601-1615
issn 1041-4347
1558-2191
language eng
recordid cdi_pascalfrancis_primary_20850808
source IEEE Electronic Library (IEL)
subjects Algorithms
Applied sciences
Chemicals
Chemistry
Computational biology
Computer science
control theory
systems
Correlation
Data mining
Data models
Data processing. List processing. Character string processing
Drugs
Exact sciences and technology
Graphs
Information retrieval. Graph
Memory organisation. Data processing
Mines
Mining
Mining methods and algorithms
Multimedia databases
Queries
Searching
Software
Streaming media
Studies
Theoretical computing
Transaction databases
XML
title Efficient Correlation Search from Graph Databases
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T16%3A50%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Efficient%20Correlation%20Search%20from%20Graph%20Databases&rft.jtitle=IEEE%20transactions%20on%20knowledge%20and%20data%20engineering&rft.au=Yiping%20Ke&rft.date=2008-12-01&rft.volume=20&rft.issue=12&rft.spage=1601&rft.epage=1615&rft.pages=1601-1615&rft.issn=1041-4347&rft.eissn=1558-2191&rft.coden=ITKEEH&rft_id=info:doi/10.1109/TKDE.2008.86&rft_dat=%3Cproquest_RIE%3E875067874%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=912279493&rft_id=info:pmid/&rft_ieee_id=4515864&rfr_iscdi=true