Defining and Evaluating Network Communities Based on Ground-Truth

Nodes in real-world networks organize into densely linked communities where edges appear with high concentration among the members of the community. Identifying such communities of nodes has proven to be a challenging task mainly due to a plethora of definitions of a community, intractability of alg...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Jaewon Yang, Leskovec, J.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 754
container_issue
container_start_page 745
container_title
container_volume
creator Jaewon Yang
Leskovec, J.
description Nodes in real-world networks organize into densely linked communities where edges appear with high concentration among the members of the community. Identifying such communities of nodes has proven to be a challenging task mainly due to a plethora of definitions of a community, intractability of algorithms, issues with evaluation and the lack of a reliable gold-standard ground-truth. In this paper we study a set of 230 large real-world social, collaboration and information networks where nodes explicitly state their group memberships. For example, in social networks nodes explicitly join various interest based social groups. We use such groups to define a reliable and robust notion of ground-truth communities. We then propose a methodology which allows us to compare and quantitatively evaluate how different structural definitions of network communities correspond to ground-truth communities. We choose 13 commonly used structural definitions of network communities and examine their sensitivity, robustness and performance in identifying the ground-truth. We show that the 13 structural definitions are heavily correlated and naturally group into four classes. We find that two of these definitions, Conductance and Triad-participation-ratio, consistently give the best performance in identifying ground-truth communities. We also investigate a task of detecting communities given a single seed node. We extend the local spectral clustering algorithm into a heuristic parameter-free community detection method that easily scales to networks with more than hundred million nodes. The proposed method achieves 30% relative improvement over current local clustering methods.
doi_str_mv 10.1109/ICDM.2012.138
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6413740</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6413740</ieee_id><sourcerecordid>6413740</sourcerecordid><originalsourceid>FETCH-LOGICAL-i241t-1ca158bdd52bd84ff65aa88a99df60e4d1f9003ff72182547f634d59740a4f363</originalsourceid><addsrcrecordid>eNotj7lOAzEURc0mkYSUVDT-AQ9-9vNWhkkIkQI0oY4cbIMhM4NmAfH3BEF1dZqjcwm5BF4AcHe9Kuf3heAgCpD2iEydsdxop9BxZY7JSEiDzKLVJ2QMqI1Ejc6ckhEoxRkaq8_JuOveOJdaSz4is3lMuc71C_V1oItPvx98_4sPsf9q2ndaNlU11LnPsaM3vouBNjVdts1QB7Zph_71gpwlv-_i9H8n5Ol2sSnv2PpxuSpna5YFQs_g2YOyuxCU2AWLKWnlvbXeuZA0jxgguUNVSkaAFQpN0hKDcga5xyS1nJCrP2-OMW4_2lz59nurEQ6XufwBr29Mow</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Defining and Evaluating Network Communities Based on Ground-Truth</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Jaewon Yang ; Leskovec, J.</creator><creatorcontrib>Jaewon Yang ; Leskovec, J.</creatorcontrib><description>Nodes in real-world networks organize into densely linked communities where edges appear with high concentration among the members of the community. Identifying such communities of nodes has proven to be a challenging task mainly due to a plethora of definitions of a community, intractability of algorithms, issues with evaluation and the lack of a reliable gold-standard ground-truth. In this paper we study a set of 230 large real-world social, collaboration and information networks where nodes explicitly state their group memberships. For example, in social networks nodes explicitly join various interest based social groups. We use such groups to define a reliable and robust notion of ground-truth communities. We then propose a methodology which allows us to compare and quantitatively evaluate how different structural definitions of network communities correspond to ground-truth communities. We choose 13 commonly used structural definitions of network communities and examine their sensitivity, robustness and performance in identifying the ground-truth. We show that the 13 structural definitions are heavily correlated and naturally group into four classes. We find that two of these definitions, Conductance and Triad-participation-ratio, consistently give the best performance in identifying ground-truth communities. We also investigate a task of detecting communities given a single seed node. We extend the local spectral clustering algorithm into a heuristic parameter-free community detection method that easily scales to networks with more than hundred million nodes. The proposed method achieves 30% relative improvement over current local clustering methods.</description><identifier>ISSN: 1550-4786</identifier><identifier>ISBN: 1467346497</identifier><identifier>ISBN: 9781467346498</identifier><identifier>EISSN: 2374-8486</identifier><identifier>EISBN: 9780769549057</identifier><identifier>EISBN: 0769549055</identifier><identifier>DOI: 10.1109/ICDM.2012.138</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Collaboration ; Communities ; Community detection ; Community scoring functions ; Image edge detection ; Measurement ; Modularity ; Network communities ; Robustness ; Social network services</subject><ispartof>2012 IEEE 12th International Conference on Data Mining, 2012, p.745-754</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6413740$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2056,27924,54919</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6413740$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Jaewon Yang</creatorcontrib><creatorcontrib>Leskovec, J.</creatorcontrib><title>Defining and Evaluating Network Communities Based on Ground-Truth</title><title>2012 IEEE 12th International Conference on Data Mining</title><addtitle>icdm</addtitle><description>Nodes in real-world networks organize into densely linked communities where edges appear with high concentration among the members of the community. Identifying such communities of nodes has proven to be a challenging task mainly due to a plethora of definitions of a community, intractability of algorithms, issues with evaluation and the lack of a reliable gold-standard ground-truth. In this paper we study a set of 230 large real-world social, collaboration and information networks where nodes explicitly state their group memberships. For example, in social networks nodes explicitly join various interest based social groups. We use such groups to define a reliable and robust notion of ground-truth communities. We then propose a methodology which allows us to compare and quantitatively evaluate how different structural definitions of network communities correspond to ground-truth communities. We choose 13 commonly used structural definitions of network communities and examine their sensitivity, robustness and performance in identifying the ground-truth. We show that the 13 structural definitions are heavily correlated and naturally group into four classes. We find that two of these definitions, Conductance and Triad-participation-ratio, consistently give the best performance in identifying ground-truth communities. We also investigate a task of detecting communities given a single seed node. We extend the local spectral clustering algorithm into a heuristic parameter-free community detection method that easily scales to networks with more than hundred million nodes. The proposed method achieves 30% relative improvement over current local clustering methods.</description><subject>Collaboration</subject><subject>Communities</subject><subject>Community detection</subject><subject>Community scoring functions</subject><subject>Image edge detection</subject><subject>Measurement</subject><subject>Modularity</subject><subject>Network communities</subject><subject>Robustness</subject><subject>Social network services</subject><issn>1550-4786</issn><issn>2374-8486</issn><isbn>1467346497</isbn><isbn>9781467346498</isbn><isbn>9780769549057</isbn><isbn>0769549055</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2012</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotj7lOAzEURc0mkYSUVDT-AQ9-9vNWhkkIkQI0oY4cbIMhM4NmAfH3BEF1dZqjcwm5BF4AcHe9Kuf3heAgCpD2iEydsdxop9BxZY7JSEiDzKLVJ2QMqI1Ejc6ckhEoxRkaq8_JuOveOJdaSz4is3lMuc71C_V1oItPvx98_4sPsf9q2ndaNlU11LnPsaM3vouBNjVdts1QB7Zph_71gpwlv-_i9H8n5Ol2sSnv2PpxuSpna5YFQs_g2YOyuxCU2AWLKWnlvbXeuZA0jxgguUNVSkaAFQpN0hKDcga5xyS1nJCrP2-OMW4_2lz59nurEQ6XufwBr29Mow</recordid><startdate>201212</startdate><enddate>201212</enddate><creator>Jaewon Yang</creator><creator>Leskovec, J.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201212</creationdate><title>Defining and Evaluating Network Communities Based on Ground-Truth</title><author>Jaewon Yang ; Leskovec, J.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i241t-1ca158bdd52bd84ff65aa88a99df60e4d1f9003ff72182547f634d59740a4f363</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Collaboration</topic><topic>Communities</topic><topic>Community detection</topic><topic>Community scoring functions</topic><topic>Image edge detection</topic><topic>Measurement</topic><topic>Modularity</topic><topic>Network communities</topic><topic>Robustness</topic><topic>Social network services</topic><toplevel>online_resources</toplevel><creatorcontrib>Jaewon Yang</creatorcontrib><creatorcontrib>Leskovec, J.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jaewon Yang</au><au>Leskovec, J.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Defining and Evaluating Network Communities Based on Ground-Truth</atitle><btitle>2012 IEEE 12th International Conference on Data Mining</btitle><stitle>icdm</stitle><date>2012-12</date><risdate>2012</risdate><spage>745</spage><epage>754</epage><pages>745-754</pages><issn>1550-4786</issn><eissn>2374-8486</eissn><isbn>1467346497</isbn><isbn>9781467346498</isbn><eisbn>9780769549057</eisbn><eisbn>0769549055</eisbn><coden>IEEPAD</coden><abstract>Nodes in real-world networks organize into densely linked communities where edges appear with high concentration among the members of the community. Identifying such communities of nodes has proven to be a challenging task mainly due to a plethora of definitions of a community, intractability of algorithms, issues with evaluation and the lack of a reliable gold-standard ground-truth. In this paper we study a set of 230 large real-world social, collaboration and information networks where nodes explicitly state their group memberships. For example, in social networks nodes explicitly join various interest based social groups. We use such groups to define a reliable and robust notion of ground-truth communities. We then propose a methodology which allows us to compare and quantitatively evaluate how different structural definitions of network communities correspond to ground-truth communities. We choose 13 commonly used structural definitions of network communities and examine their sensitivity, robustness and performance in identifying the ground-truth. We show that the 13 structural definitions are heavily correlated and naturally group into four classes. We find that two of these definitions, Conductance and Triad-participation-ratio, consistently give the best performance in identifying ground-truth communities. We also investigate a task of detecting communities given a single seed node. We extend the local spectral clustering algorithm into a heuristic parameter-free community detection method that easily scales to networks with more than hundred million nodes. The proposed method achieves 30% relative improvement over current local clustering methods.</abstract><pub>IEEE</pub><doi>10.1109/ICDM.2012.138</doi><tpages>10</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1550-4786
ispartof 2012 IEEE 12th International Conference on Data Mining, 2012, p.745-754
issn 1550-4786
2374-8486
language eng
recordid cdi_ieee_primary_6413740
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Collaboration
Communities
Community detection
Community scoring functions
Image edge detection
Measurement
Modularity
Network communities
Robustness
Social network services
title Defining and Evaluating Network Communities Based on Ground-Truth
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T14%3A03%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Defining%20and%20Evaluating%20Network%20Communities%20Based%20on%20Ground-Truth&rft.btitle=2012%20IEEE%2012th%20International%20Conference%20on%20Data%20Mining&rft.au=Jaewon%20Yang&rft.date=2012-12&rft.spage=745&rft.epage=754&rft.pages=745-754&rft.issn=1550-4786&rft.eissn=2374-8486&rft.isbn=1467346497&rft.isbn_list=9781467346498&rft.coden=IEEPAD&rft_id=info:doi/10.1109/ICDM.2012.138&rft_dat=%3Cieee_6IE%3E6413740%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9780769549057&rft.eisbn_list=0769549055&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6413740&rfr_iscdi=true