Modularity-based Hypergraph Clustering: Random Hypergraph Model, Hyperedge-cluster Relation, and Computation
A graph models the connections among objects. One important graph analytical task is clustering which partitions a data graph into clusters with dense innercluster connections. A line of clustering maximizes a function called modularity. Modularity-based clustering is widely adopted on dyadic graphs...
Gespeichert in:
Veröffentlicht in: | Proceedings of the ACM on management of data 2023-11, Vol.1 (3), p.1-25, Article 215 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 25 |
---|---|
container_issue | 3 |
container_start_page | 1 |
container_title | Proceedings of the ACM on management of data |
container_volume | 1 |
creator | Feng, Zijin Qiao, Miao Cheng, Hong |
description | A graph models the connections among objects. One important graph analytical task is clustering which partitions a data graph into clusters with dense innercluster connections. A line of clustering maximizes a function called modularity. Modularity-based clustering is widely adopted on dyadic graphs due to its scalability and clustering quality which depends highly on its selection of a random graph model. The random graph model decides not only which clustering is preferred - modularity measures the quality of a clustering based on its alignment to the edges of a random graph, but also the cost of computing such an alignment. Existing random hypergraph models either measure the hyperedge-cluster alignment in an All-Or-Nothing (AON) manner, losing important group-wise information, or introduce expensive alignment computation, refraining the clustering from scaling up. This paper proposes a new random hypergraph model called Hyperedge Expansion Model (HEM), a non-AON hypergraph modularity function called Partial Innerclusteredge modularity (PI) based on HEM, a clustering algorithm called Partial Innerclusteredge Clustering (PIC) that optimizes PI, and novel computation optimizations. PIC is a scalable modularity-based hypergraph clustering that can effectively capture the non-AON hyperedge-cluster relation. Our experiments show that PIC outperforms eight state-of-the-art methods on real-world hypergraphs in terms of both clustering quality and scalability and is up to five orders of magnitude faster than the baseline methods. |
doi_str_mv | 10.1145/3617335 |
format | Article |
fullrecord | <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3617335</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3617335</sourcerecordid><originalsourceid>FETCH-LOGICAL-a515-9f7a39972423e6cb5291b64f88e64ba0f5f50771d7123482fbde2acb8c7098293</originalsourceid><addsrcrecordid>eNpNkL1PwzAUxC0EElWp2JmysTTgj9iO2VBEKVIRUtU9cpznEOR8yE6G_PcUUlCnd7r73RsOoVuCHwhJ-CMTRDLGL9CCpkzEgkt2eaav0SqEL4wxVYIRJRbIvXfl6LSvhykudIAy2k49-Mrr_jPK3BgG8HVbPUV73ZZdc54em-DWswNlBbGZ8WgPTg91166jYyfKuqYfh1_jBl1Z7QKsTneJDpuXQ7aNdx-vb9nzLtac8FhZqZlSkiaUgTAFp4oUIrFpCiIpNLbcciwlKSWhLEmpLUqg2hSpkVilVLElup_fGt-F4MHmva8b7aec4Pxnpvw005G8m0ltmn_oL_wG01tirQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Modularity-based Hypergraph Clustering: Random Hypergraph Model, Hyperedge-cluster Relation, and Computation</title><source>ACM Digital Library Complete</source><creator>Feng, Zijin ; Qiao, Miao ; Cheng, Hong</creator><creatorcontrib>Feng, Zijin ; Qiao, Miao ; Cheng, Hong</creatorcontrib><description>A graph models the connections among objects. One important graph analytical task is clustering which partitions a data graph into clusters with dense innercluster connections. A line of clustering maximizes a function called modularity. Modularity-based clustering is widely adopted on dyadic graphs due to its scalability and clustering quality which depends highly on its selection of a random graph model. The random graph model decides not only which clustering is preferred - modularity measures the quality of a clustering based on its alignment to the edges of a random graph, but also the cost of computing such an alignment. Existing random hypergraph models either measure the hyperedge-cluster alignment in an All-Or-Nothing (AON) manner, losing important group-wise information, or introduce expensive alignment computation, refraining the clustering from scaling up. This paper proposes a new random hypergraph model called Hyperedge Expansion Model (HEM), a non-AON hypergraph modularity function called Partial Innerclusteredge modularity (PI) based on HEM, a clustering algorithm called Partial Innerclusteredge Clustering (PIC) that optimizes PI, and novel computation optimizations. PIC is a scalable modularity-based hypergraph clustering that can effectively capture the non-AON hyperedge-cluster relation. Our experiments show that PIC outperforms eight state-of-the-art methods on real-world hypergraphs in terms of both clustering quality and scalability and is up to five orders of magnitude faster than the baseline methods.</description><identifier>ISSN: 2836-6573</identifier><identifier>EISSN: 2836-6573</identifier><identifier>DOI: 10.1145/3617335</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Cluster analysis ; Clustering ; Computing methodologies ; Data mining ; Discrete mathematics ; Graph theory ; Hypergraphs ; Information systems ; Information systems applications ; Learning paradigms ; Machine learning ; Mathematics of computing ; Unsupervised learning</subject><ispartof>Proceedings of the ACM on management of data, 2023-11, Vol.1 (3), p.1-25, Article 215</ispartof><rights>ACM</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-a515-9f7a39972423e6cb5291b64f88e64ba0f5f50771d7123482fbde2acb8c7098293</cites><orcidid>0000-0002-9746-8253 ; 0000-0002-4673-2587 ; 0000-0001-8374-140X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3617335$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>314,776,780,2275,27903,27904,40175,75974</link.rule.ids></links><search><creatorcontrib>Feng, Zijin</creatorcontrib><creatorcontrib>Qiao, Miao</creatorcontrib><creatorcontrib>Cheng, Hong</creatorcontrib><title>Modularity-based Hypergraph Clustering: Random Hypergraph Model, Hyperedge-cluster Relation, and Computation</title><title>Proceedings of the ACM on management of data</title><addtitle>ACM PACMMOD</addtitle><description>A graph models the connections among objects. One important graph analytical task is clustering which partitions a data graph into clusters with dense innercluster connections. A line of clustering maximizes a function called modularity. Modularity-based clustering is widely adopted on dyadic graphs due to its scalability and clustering quality which depends highly on its selection of a random graph model. The random graph model decides not only which clustering is preferred - modularity measures the quality of a clustering based on its alignment to the edges of a random graph, but also the cost of computing such an alignment. Existing random hypergraph models either measure the hyperedge-cluster alignment in an All-Or-Nothing (AON) manner, losing important group-wise information, or introduce expensive alignment computation, refraining the clustering from scaling up. This paper proposes a new random hypergraph model called Hyperedge Expansion Model (HEM), a non-AON hypergraph modularity function called Partial Innerclusteredge modularity (PI) based on HEM, a clustering algorithm called Partial Innerclusteredge Clustering (PIC) that optimizes PI, and novel computation optimizations. PIC is a scalable modularity-based hypergraph clustering that can effectively capture the non-AON hyperedge-cluster relation. Our experiments show that PIC outperforms eight state-of-the-art methods on real-world hypergraphs in terms of both clustering quality and scalability and is up to five orders of magnitude faster than the baseline methods.</description><subject>Cluster analysis</subject><subject>Clustering</subject><subject>Computing methodologies</subject><subject>Data mining</subject><subject>Discrete mathematics</subject><subject>Graph theory</subject><subject>Hypergraphs</subject><subject>Information systems</subject><subject>Information systems applications</subject><subject>Learning paradigms</subject><subject>Machine learning</subject><subject>Mathematics of computing</subject><subject>Unsupervised learning</subject><issn>2836-6573</issn><issn>2836-6573</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNpNkL1PwzAUxC0EElWp2JmysTTgj9iO2VBEKVIRUtU9cpznEOR8yE6G_PcUUlCnd7r73RsOoVuCHwhJ-CMTRDLGL9CCpkzEgkt2eaav0SqEL4wxVYIRJRbIvXfl6LSvhykudIAy2k49-Mrr_jPK3BgG8HVbPUV73ZZdc54em-DWswNlBbGZ8WgPTg91166jYyfKuqYfh1_jBl1Z7QKsTneJDpuXQ7aNdx-vb9nzLtac8FhZqZlSkiaUgTAFp4oUIrFpCiIpNLbcciwlKSWhLEmpLUqg2hSpkVilVLElup_fGt-F4MHmva8b7aec4Pxnpvw005G8m0ltmn_oL_wG01tirQ</recordid><startdate>20231113</startdate><enddate>20231113</enddate><creator>Feng, Zijin</creator><creator>Qiao, Miao</creator><creator>Cheng, Hong</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-9746-8253</orcidid><orcidid>https://orcid.org/0000-0002-4673-2587</orcidid><orcidid>https://orcid.org/0000-0001-8374-140X</orcidid></search><sort><creationdate>20231113</creationdate><title>Modularity-based Hypergraph Clustering: Random Hypergraph Model, Hyperedge-cluster Relation, and Computation</title><author>Feng, Zijin ; Qiao, Miao ; Cheng, Hong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a515-9f7a39972423e6cb5291b64f88e64ba0f5f50771d7123482fbde2acb8c7098293</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Cluster analysis</topic><topic>Clustering</topic><topic>Computing methodologies</topic><topic>Data mining</topic><topic>Discrete mathematics</topic><topic>Graph theory</topic><topic>Hypergraphs</topic><topic>Information systems</topic><topic>Information systems applications</topic><topic>Learning paradigms</topic><topic>Machine learning</topic><topic>Mathematics of computing</topic><topic>Unsupervised learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Feng, Zijin</creatorcontrib><creatorcontrib>Qiao, Miao</creatorcontrib><creatorcontrib>Cheng, Hong</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of the ACM on management of data</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Feng, Zijin</au><au>Qiao, Miao</au><au>Cheng, Hong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Modularity-based Hypergraph Clustering: Random Hypergraph Model, Hyperedge-cluster Relation, and Computation</atitle><jtitle>Proceedings of the ACM on management of data</jtitle><stitle>ACM PACMMOD</stitle><date>2023-11-13</date><risdate>2023</risdate><volume>1</volume><issue>3</issue><spage>1</spage><epage>25</epage><pages>1-25</pages><artnum>215</artnum><issn>2836-6573</issn><eissn>2836-6573</eissn><abstract>A graph models the connections among objects. One important graph analytical task is clustering which partitions a data graph into clusters with dense innercluster connections. A line of clustering maximizes a function called modularity. Modularity-based clustering is widely adopted on dyadic graphs due to its scalability and clustering quality which depends highly on its selection of a random graph model. The random graph model decides not only which clustering is preferred - modularity measures the quality of a clustering based on its alignment to the edges of a random graph, but also the cost of computing such an alignment. Existing random hypergraph models either measure the hyperedge-cluster alignment in an All-Or-Nothing (AON) manner, losing important group-wise information, or introduce expensive alignment computation, refraining the clustering from scaling up. This paper proposes a new random hypergraph model called Hyperedge Expansion Model (HEM), a non-AON hypergraph modularity function called Partial Innerclusteredge modularity (PI) based on HEM, a clustering algorithm called Partial Innerclusteredge Clustering (PIC) that optimizes PI, and novel computation optimizations. PIC is a scalable modularity-based hypergraph clustering that can effectively capture the non-AON hyperedge-cluster relation. Our experiments show that PIC outperforms eight state-of-the-art methods on real-world hypergraphs in terms of both clustering quality and scalability and is up to five orders of magnitude faster than the baseline methods.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/3617335</doi><tpages>25</tpages><orcidid>https://orcid.org/0000-0002-9746-8253</orcidid><orcidid>https://orcid.org/0000-0002-4673-2587</orcidid><orcidid>https://orcid.org/0000-0001-8374-140X</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2836-6573 |
ispartof | Proceedings of the ACM on management of data, 2023-11, Vol.1 (3), p.1-25, Article 215 |
issn | 2836-6573 2836-6573 |
language | eng |
recordid | cdi_crossref_primary_10_1145_3617335 |
source | ACM Digital Library Complete |
subjects | Cluster analysis Clustering Computing methodologies Data mining Discrete mathematics Graph theory Hypergraphs Information systems Information systems applications Learning paradigms Machine learning Mathematics of computing Unsupervised learning |
title | Modularity-based Hypergraph Clustering: Random Hypergraph Model, Hyperedge-cluster Relation, and Computation |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T07%3A48%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Modularity-based%20Hypergraph%20Clustering:%20Random%20Hypergraph%20Model,%20Hyperedge-cluster%20Relation,%20and%20Computation&rft.jtitle=Proceedings%20of%20the%20ACM%20on%20management%20of%20data&rft.au=Feng,%20Zijin&rft.date=2023-11-13&rft.volume=1&rft.issue=3&rft.spage=1&rft.epage=25&rft.pages=1-25&rft.artnum=215&rft.issn=2836-6573&rft.eissn=2836-6573&rft_id=info:doi/10.1145/3617335&rft_dat=%3Cacm_cross%3E3617335%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |