Efficient Correlation Search from Graph Databases
Correlation mining has gained great success in many application domains for its ability to capture the underlying dependency between objects. However, research on correlation mining from graph databases is still lacking despite the proliferation of graph data in recent years. We propose a new proble...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on knowledge and data engineering 2008-12, Vol.20 (12), p.1601-1615 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1615 |
---|---|
container_issue | 12 |
container_start_page | 1601 |
container_title | IEEE transactions on knowledge and data engineering |
container_volume | 20 |
creator | Yiping Ke Cheng, J. Ng, W. |
description | Correlation mining has gained great success in many application domains for its ability to capture the underlying dependency between objects. However, research on correlation mining from graph databases is still lacking despite the proliferation of graph data in recent years. We propose a new problem of correlation mining from graph databases, called correlated graph search (CGS). CGS adopts Pearson's correlation coefficient to take into account the occurrence distributions of graphs. However, the problem poses significant challenges, since every subgraph of a graph in the database is a candidate but the number of subgraphs is exponential. We derive two necessary conditions that set bounds on the occurrence probability of a candidate in the database. With this result, we devise an efficient algorithm that mines the candidate set from a much smaller projected database and thus a significantly smaller set of candidates is obtained. Three heuristic rules are further developed to refine the candidate set. We also make use of the bounds to directly answer high-support queries without mining the candidates. Experimental results justify the efficiency of our algorithm. Finally, we generalize the CGS problem and show that our algorithm provides a general solution to most of the existing correlation measures. |
doi_str_mv | 10.1109/TKDE.2008.86 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_pascalfrancis_primary_20850808</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4515864</ieee_id><sourcerecordid>875067874</sourcerecordid><originalsourceid>FETCH-LOGICAL-c413t-a404ca89d78f2425bc3db30a93cbf0b3fca1f27acc38c839027dd9e988d70c743</originalsourceid><addsrcrecordid>eNp90D1PwzAQBmALgUQpbGwsERKwkOKvxOcRteVDVGKgzNbFsdVUaVLsdODfk9KqAwOTT_Jzr3QvIZeMjhij-mH-NpmOOKUwgvyIDFiWQcqZZsf9TCVLpZDqlJzFuKQ9UsAGhE29r2zlmi4ZtyG4GruqbZIPh8EuEh_aVfIccL1IJthhgdHFc3LisY7uYv8OyefTdD5-SWfvz6_jx1lqJRNdipJKi6BLBZ5LnhVWlIWgqIUtPC2Et8g8V2itAAtCU67KUjsNUCpqlRRDcrfLXYf2a-NiZ1ZVtK6usXHtJhpQGc0V_Mrbf6WQGTBQeQ-v_8BluwlNf4XRjHOlpRY9ut8hG9oYg_NmHaoVhm_DqNnWbLY1m23NBraZN_tMjBZrH7CxVTzscAoZBQq9u9q5yjl3-JYZyyCX4gf9z4ON</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>912279493</pqid></control><display><type>article</type><title>Efficient Correlation Search from Graph Databases</title><source>IEEE Electronic Library (IEL)</source><creator>Yiping Ke ; Cheng, J. ; Ng, W.</creator><creatorcontrib>Yiping Ke ; Cheng, J. ; Ng, W.</creatorcontrib><description>Correlation mining has gained great success in many application domains for its ability to capture the underlying dependency between objects. However, research on correlation mining from graph databases is still lacking despite the proliferation of graph data in recent years. We propose a new problem of correlation mining from graph databases, called correlated graph search (CGS). CGS adopts Pearson's correlation coefficient to take into account the occurrence distributions of graphs. However, the problem poses significant challenges, since every subgraph of a graph in the database is a candidate but the number of subgraphs is exponential. We derive two necessary conditions that set bounds on the occurrence probability of a candidate in the database. With this result, we devise an efficient algorithm that mines the candidate set from a much smaller projected database and thus a significantly smaller set of candidates is obtained. Three heuristic rules are further developed to refine the candidate set. We also make use of the bounds to directly answer high-support queries without mining the candidates. Experimental results justify the efficiency of our algorithm. Finally, we generalize the CGS problem and show that our algorithm provides a general solution to most of the existing correlation measures.</description><identifier>ISSN: 1041-4347</identifier><identifier>EISSN: 1558-2191</identifier><identifier>DOI: 10.1109/TKDE.2008.86</identifier><identifier>CODEN: ITKEEH</identifier><language>eng</language><publisher>New York, NY: IEEE</publisher><subject>Algorithms ; Applied sciences ; Chemicals ; Chemistry ; Computational biology ; Computer science; control theory; systems ; Correlation ; Data mining ; Data models ; Data processing. List processing. Character string processing ; Drugs ; Exact sciences and technology ; Graphs ; Information retrieval. Graph ; Memory organisation. Data processing ; Mines ; Mining ; Mining methods and algorithms ; Multimedia databases ; Queries ; Searching ; Software ; Streaming media ; Studies ; Theoretical computing ; Transaction databases ; XML</subject><ispartof>IEEE transactions on knowledge and data engineering, 2008-12, Vol.20 (12), p.1601-1615</ispartof><rights>2009 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2008</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c413t-a404ca89d78f2425bc3db30a93cbf0b3fca1f27acc38c839027dd9e988d70c743</citedby><cites>FETCH-LOGICAL-c413t-a404ca89d78f2425bc3db30a93cbf0b3fca1f27acc38c839027dd9e988d70c743</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4515864$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4515864$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=20850808$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Yiping Ke</creatorcontrib><creatorcontrib>Cheng, J.</creatorcontrib><creatorcontrib>Ng, W.</creatorcontrib><title>Efficient Correlation Search from Graph Databases</title><title>IEEE transactions on knowledge and data engineering</title><addtitle>TKDE</addtitle><description>Correlation mining has gained great success in many application domains for its ability to capture the underlying dependency between objects. However, research on correlation mining from graph databases is still lacking despite the proliferation of graph data in recent years. We propose a new problem of correlation mining from graph databases, called correlated graph search (CGS). CGS adopts Pearson's correlation coefficient to take into account the occurrence distributions of graphs. However, the problem poses significant challenges, since every subgraph of a graph in the database is a candidate but the number of subgraphs is exponential. We derive two necessary conditions that set bounds on the occurrence probability of a candidate in the database. With this result, we devise an efficient algorithm that mines the candidate set from a much smaller projected database and thus a significantly smaller set of candidates is obtained. Three heuristic rules are further developed to refine the candidate set. We also make use of the bounds to directly answer high-support queries without mining the candidates. Experimental results justify the efficiency of our algorithm. Finally, we generalize the CGS problem and show that our algorithm provides a general solution to most of the existing correlation measures.</description><subject>Algorithms</subject><subject>Applied sciences</subject><subject>Chemicals</subject><subject>Chemistry</subject><subject>Computational biology</subject><subject>Computer science; control theory; systems</subject><subject>Correlation</subject><subject>Data mining</subject><subject>Data models</subject><subject>Data processing. List processing. Character string processing</subject><subject>Drugs</subject><subject>Exact sciences and technology</subject><subject>Graphs</subject><subject>Information retrieval. Graph</subject><subject>Memory organisation. Data processing</subject><subject>Mines</subject><subject>Mining</subject><subject>Mining methods and algorithms</subject><subject>Multimedia databases</subject><subject>Queries</subject><subject>Searching</subject><subject>Software</subject><subject>Streaming media</subject><subject>Studies</subject><subject>Theoretical computing</subject><subject>Transaction databases</subject><subject>XML</subject><issn>1041-4347</issn><issn>1558-2191</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2008</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNp90D1PwzAQBmALgUQpbGwsERKwkOKvxOcRteVDVGKgzNbFsdVUaVLsdODfk9KqAwOTT_Jzr3QvIZeMjhij-mH-NpmOOKUwgvyIDFiWQcqZZsf9TCVLpZDqlJzFuKQ9UsAGhE29r2zlmi4ZtyG4GruqbZIPh8EuEh_aVfIccL1IJthhgdHFc3LisY7uYv8OyefTdD5-SWfvz6_jx1lqJRNdipJKi6BLBZ5LnhVWlIWgqIUtPC2Et8g8V2itAAtCU67KUjsNUCpqlRRDcrfLXYf2a-NiZ1ZVtK6usXHtJhpQGc0V_Mrbf6WQGTBQeQ-v_8BluwlNf4XRjHOlpRY9ut8hG9oYg_NmHaoVhm_DqNnWbLY1m23NBraZN_tMjBZrH7CxVTzscAoZBQq9u9q5yjl3-JYZyyCX4gf9z4ON</recordid><startdate>20081201</startdate><enddate>20081201</enddate><creator>Yiping Ke</creator><creator>Cheng, J.</creator><creator>Ng, W.</creator><general>IEEE</general><general>IEEE Computer Society</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope><scope>FR3</scope></search><sort><creationdate>20081201</creationdate><title>Efficient Correlation Search from Graph Databases</title><author>Yiping Ke ; Cheng, J. ; Ng, W.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c413t-a404ca89d78f2425bc3db30a93cbf0b3fca1f27acc38c839027dd9e988d70c743</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2008</creationdate><topic>Algorithms</topic><topic>Applied sciences</topic><topic>Chemicals</topic><topic>Chemistry</topic><topic>Computational biology</topic><topic>Computer science; control theory; systems</topic><topic>Correlation</topic><topic>Data mining</topic><topic>Data models</topic><topic>Data processing. List processing. Character string processing</topic><topic>Drugs</topic><topic>Exact sciences and technology</topic><topic>Graphs</topic><topic>Information retrieval. Graph</topic><topic>Memory organisation. Data processing</topic><topic>Mines</topic><topic>Mining</topic><topic>Mining methods and algorithms</topic><topic>Multimedia databases</topic><topic>Queries</topic><topic>Searching</topic><topic>Software</topic><topic>Streaming media</topic><topic>Studies</topic><topic>Theoretical computing</topic><topic>Transaction databases</topic><topic>XML</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yiping Ke</creatorcontrib><creatorcontrib>Cheng, J.</creatorcontrib><creatorcontrib>Ng, W.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><jtitle>IEEE transactions on knowledge and data engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yiping Ke</au><au>Cheng, J.</au><au>Ng, W.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Efficient Correlation Search from Graph Databases</atitle><jtitle>IEEE transactions on knowledge and data engineering</jtitle><stitle>TKDE</stitle><date>2008-12-01</date><risdate>2008</risdate><volume>20</volume><issue>12</issue><spage>1601</spage><epage>1615</epage><pages>1601-1615</pages><issn>1041-4347</issn><eissn>1558-2191</eissn><coden>ITKEEH</coden><abstract>Correlation mining has gained great success in many application domains for its ability to capture the underlying dependency between objects. However, research on correlation mining from graph databases is still lacking despite the proliferation of graph data in recent years. We propose a new problem of correlation mining from graph databases, called correlated graph search (CGS). CGS adopts Pearson's correlation coefficient to take into account the occurrence distributions of graphs. However, the problem poses significant challenges, since every subgraph of a graph in the database is a candidate but the number of subgraphs is exponential. We derive two necessary conditions that set bounds on the occurrence probability of a candidate in the database. With this result, we devise an efficient algorithm that mines the candidate set from a much smaller projected database and thus a significantly smaller set of candidates is obtained. Three heuristic rules are further developed to refine the candidate set. We also make use of the bounds to directly answer high-support queries without mining the candidates. Experimental results justify the efficiency of our algorithm. Finally, we generalize the CGS problem and show that our algorithm provides a general solution to most of the existing correlation measures.</abstract><cop>New York, NY</cop><pub>IEEE</pub><doi>10.1109/TKDE.2008.86</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1041-4347 |
ispartof | IEEE transactions on knowledge and data engineering, 2008-12, Vol.20 (12), p.1601-1615 |
issn | 1041-4347 1558-2191 |
language | eng |
recordid | cdi_pascalfrancis_primary_20850808 |
source | IEEE Electronic Library (IEL) |
subjects | Algorithms Applied sciences Chemicals Chemistry Computational biology Computer science control theory systems Correlation Data mining Data models Data processing. List processing. Character string processing Drugs Exact sciences and technology Graphs Information retrieval. Graph Memory organisation. Data processing Mines Mining Mining methods and algorithms Multimedia databases Queries Searching Software Streaming media Studies Theoretical computing Transaction databases XML |
title | Efficient Correlation Search from Graph Databases |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T16%3A50%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Efficient%20Correlation%20Search%20from%20Graph%20Databases&rft.jtitle=IEEE%20transactions%20on%20knowledge%20and%20data%20engineering&rft.au=Yiping%20Ke&rft.date=2008-12-01&rft.volume=20&rft.issue=12&rft.spage=1601&rft.epage=1615&rft.pages=1601-1615&rft.issn=1041-4347&rft.eissn=1558-2191&rft.coden=ITKEEH&rft_id=info:doi/10.1109/TKDE.2008.86&rft_dat=%3Cproquest_RIE%3E875067874%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=912279493&rft_id=info:pmid/&rft_ieee_id=4515864&rfr_iscdi=true |