Modularity-based Hypergraph Clustering: Random Hypergraph Model, Hyperedge-cluster Relation, and Computation

A graph models the connections among objects. One important graph analytical task is clustering which partitions a data graph into clusters with dense innercluster connections. A line of clustering maximizes a function called modularity. Modularity-based clustering is widely adopted on dyadic graphs...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the ACM on management of data 2023-11, Vol.1 (3), p.1-25, Article 215
Hauptverfasser: Feng, Zijin, Qiao, Miao, Cheng, Hong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 25
container_issue 3
container_start_page 1
container_title Proceedings of the ACM on management of data
container_volume 1
creator Feng, Zijin
Qiao, Miao
Cheng, Hong
description A graph models the connections among objects. One important graph analytical task is clustering which partitions a data graph into clusters with dense innercluster connections. A line of clustering maximizes a function called modularity. Modularity-based clustering is widely adopted on dyadic graphs due to its scalability and clustering quality which depends highly on its selection of a random graph model. The random graph model decides not only which clustering is preferred - modularity measures the quality of a clustering based on its alignment to the edges of a random graph, but also the cost of computing such an alignment. Existing random hypergraph models either measure the hyperedge-cluster alignment in an All-Or-Nothing (AON) manner, losing important group-wise information, or introduce expensive alignment computation, refraining the clustering from scaling up. This paper proposes a new random hypergraph model called Hyperedge Expansion Model (HEM), a non-AON hypergraph modularity function called Partial Innerclusteredge modularity (PI) based on HEM, a clustering algorithm called Partial Innerclusteredge Clustering (PIC) that optimizes PI, and novel computation optimizations. PIC is a scalable modularity-based hypergraph clustering that can effectively capture the non-AON hyperedge-cluster relation. Our experiments show that PIC outperforms eight state-of-the-art methods on real-world hypergraphs in terms of both clustering quality and scalability and is up to five orders of magnitude faster than the baseline methods.
doi_str_mv 10.1145/3617335
format Article
fullrecord <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3617335</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3617335</sourcerecordid><originalsourceid>FETCH-LOGICAL-a515-9f7a39972423e6cb5291b64f88e64ba0f5f50771d7123482fbde2acb8c7098293</originalsourceid><addsrcrecordid>eNpNkL1PwzAUxC0EElWp2JmysTTgj9iO2VBEKVIRUtU9cpznEOR8yE6G_PcUUlCnd7r73RsOoVuCHwhJ-CMTRDLGL9CCpkzEgkt2eaav0SqEL4wxVYIRJRbIvXfl6LSvhykudIAy2k49-Mrr_jPK3BgG8HVbPUV73ZZdc54em-DWswNlBbGZ8WgPTg91166jYyfKuqYfh1_jBl1Z7QKsTneJDpuXQ7aNdx-vb9nzLtac8FhZqZlSkiaUgTAFp4oUIrFpCiIpNLbcciwlKSWhLEmpLUqg2hSpkVilVLElup_fGt-F4MHmva8b7aec4Pxnpvw005G8m0ltmn_oL_wG01tirQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Modularity-based Hypergraph Clustering: Random Hypergraph Model, Hyperedge-cluster Relation, and Computation</title><source>ACM Digital Library Complete</source><creator>Feng, Zijin ; Qiao, Miao ; Cheng, Hong</creator><creatorcontrib>Feng, Zijin ; Qiao, Miao ; Cheng, Hong</creatorcontrib><description>A graph models the connections among objects. One important graph analytical task is clustering which partitions a data graph into clusters with dense innercluster connections. A line of clustering maximizes a function called modularity. Modularity-based clustering is widely adopted on dyadic graphs due to its scalability and clustering quality which depends highly on its selection of a random graph model. The random graph model decides not only which clustering is preferred - modularity measures the quality of a clustering based on its alignment to the edges of a random graph, but also the cost of computing such an alignment. Existing random hypergraph models either measure the hyperedge-cluster alignment in an All-Or-Nothing (AON) manner, losing important group-wise information, or introduce expensive alignment computation, refraining the clustering from scaling up. This paper proposes a new random hypergraph model called Hyperedge Expansion Model (HEM), a non-AON hypergraph modularity function called Partial Innerclusteredge modularity (PI) based on HEM, a clustering algorithm called Partial Innerclusteredge Clustering (PIC) that optimizes PI, and novel computation optimizations. PIC is a scalable modularity-based hypergraph clustering that can effectively capture the non-AON hyperedge-cluster relation. Our experiments show that PIC outperforms eight state-of-the-art methods on real-world hypergraphs in terms of both clustering quality and scalability and is up to five orders of magnitude faster than the baseline methods.</description><identifier>ISSN: 2836-6573</identifier><identifier>EISSN: 2836-6573</identifier><identifier>DOI: 10.1145/3617335</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Cluster analysis ; Clustering ; Computing methodologies ; Data mining ; Discrete mathematics ; Graph theory ; Hypergraphs ; Information systems ; Information systems applications ; Learning paradigms ; Machine learning ; Mathematics of computing ; Unsupervised learning</subject><ispartof>Proceedings of the ACM on management of data, 2023-11, Vol.1 (3), p.1-25, Article 215</ispartof><rights>ACM</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-a515-9f7a39972423e6cb5291b64f88e64ba0f5f50771d7123482fbde2acb8c7098293</cites><orcidid>0000-0002-9746-8253 ; 0000-0002-4673-2587 ; 0000-0001-8374-140X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3617335$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>314,776,780,2275,27903,27904,40175,75974</link.rule.ids></links><search><creatorcontrib>Feng, Zijin</creatorcontrib><creatorcontrib>Qiao, Miao</creatorcontrib><creatorcontrib>Cheng, Hong</creatorcontrib><title>Modularity-based Hypergraph Clustering: Random Hypergraph Model, Hyperedge-cluster Relation, and Computation</title><title>Proceedings of the ACM on management of data</title><addtitle>ACM PACMMOD</addtitle><description>A graph models the connections among objects. One important graph analytical task is clustering which partitions a data graph into clusters with dense innercluster connections. A line of clustering maximizes a function called modularity. Modularity-based clustering is widely adopted on dyadic graphs due to its scalability and clustering quality which depends highly on its selection of a random graph model. The random graph model decides not only which clustering is preferred - modularity measures the quality of a clustering based on its alignment to the edges of a random graph, but also the cost of computing such an alignment. Existing random hypergraph models either measure the hyperedge-cluster alignment in an All-Or-Nothing (AON) manner, losing important group-wise information, or introduce expensive alignment computation, refraining the clustering from scaling up. This paper proposes a new random hypergraph model called Hyperedge Expansion Model (HEM), a non-AON hypergraph modularity function called Partial Innerclusteredge modularity (PI) based on HEM, a clustering algorithm called Partial Innerclusteredge Clustering (PIC) that optimizes PI, and novel computation optimizations. PIC is a scalable modularity-based hypergraph clustering that can effectively capture the non-AON hyperedge-cluster relation. Our experiments show that PIC outperforms eight state-of-the-art methods on real-world hypergraphs in terms of both clustering quality and scalability and is up to five orders of magnitude faster than the baseline methods.</description><subject>Cluster analysis</subject><subject>Clustering</subject><subject>Computing methodologies</subject><subject>Data mining</subject><subject>Discrete mathematics</subject><subject>Graph theory</subject><subject>Hypergraphs</subject><subject>Information systems</subject><subject>Information systems applications</subject><subject>Learning paradigms</subject><subject>Machine learning</subject><subject>Mathematics of computing</subject><subject>Unsupervised learning</subject><issn>2836-6573</issn><issn>2836-6573</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNpNkL1PwzAUxC0EElWp2JmysTTgj9iO2VBEKVIRUtU9cpznEOR8yE6G_PcUUlCnd7r73RsOoVuCHwhJ-CMTRDLGL9CCpkzEgkt2eaav0SqEL4wxVYIRJRbIvXfl6LSvhykudIAy2k49-Mrr_jPK3BgG8HVbPUV73ZZdc54em-DWswNlBbGZ8WgPTg91166jYyfKuqYfh1_jBl1Z7QKsTneJDpuXQ7aNdx-vb9nzLtac8FhZqZlSkiaUgTAFp4oUIrFpCiIpNLbcciwlKSWhLEmpLUqg2hSpkVilVLElup_fGt-F4MHmva8b7aec4Pxnpvw005G8m0ltmn_oL_wG01tirQ</recordid><startdate>20231113</startdate><enddate>20231113</enddate><creator>Feng, Zijin</creator><creator>Qiao, Miao</creator><creator>Cheng, Hong</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-9746-8253</orcidid><orcidid>https://orcid.org/0000-0002-4673-2587</orcidid><orcidid>https://orcid.org/0000-0001-8374-140X</orcidid></search><sort><creationdate>20231113</creationdate><title>Modularity-based Hypergraph Clustering: Random Hypergraph Model, Hyperedge-cluster Relation, and Computation</title><author>Feng, Zijin ; Qiao, Miao ; Cheng, Hong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a515-9f7a39972423e6cb5291b64f88e64ba0f5f50771d7123482fbde2acb8c7098293</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Cluster analysis</topic><topic>Clustering</topic><topic>Computing methodologies</topic><topic>Data mining</topic><topic>Discrete mathematics</topic><topic>Graph theory</topic><topic>Hypergraphs</topic><topic>Information systems</topic><topic>Information systems applications</topic><topic>Learning paradigms</topic><topic>Machine learning</topic><topic>Mathematics of computing</topic><topic>Unsupervised learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Feng, Zijin</creatorcontrib><creatorcontrib>Qiao, Miao</creatorcontrib><creatorcontrib>Cheng, Hong</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of the ACM on management of data</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Feng, Zijin</au><au>Qiao, Miao</au><au>Cheng, Hong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Modularity-based Hypergraph Clustering: Random Hypergraph Model, Hyperedge-cluster Relation, and Computation</atitle><jtitle>Proceedings of the ACM on management of data</jtitle><stitle>ACM PACMMOD</stitle><date>2023-11-13</date><risdate>2023</risdate><volume>1</volume><issue>3</issue><spage>1</spage><epage>25</epage><pages>1-25</pages><artnum>215</artnum><issn>2836-6573</issn><eissn>2836-6573</eissn><abstract>A graph models the connections among objects. One important graph analytical task is clustering which partitions a data graph into clusters with dense innercluster connections. A line of clustering maximizes a function called modularity. Modularity-based clustering is widely adopted on dyadic graphs due to its scalability and clustering quality which depends highly on its selection of a random graph model. The random graph model decides not only which clustering is preferred - modularity measures the quality of a clustering based on its alignment to the edges of a random graph, but also the cost of computing such an alignment. Existing random hypergraph models either measure the hyperedge-cluster alignment in an All-Or-Nothing (AON) manner, losing important group-wise information, or introduce expensive alignment computation, refraining the clustering from scaling up. This paper proposes a new random hypergraph model called Hyperedge Expansion Model (HEM), a non-AON hypergraph modularity function called Partial Innerclusteredge modularity (PI) based on HEM, a clustering algorithm called Partial Innerclusteredge Clustering (PIC) that optimizes PI, and novel computation optimizations. PIC is a scalable modularity-based hypergraph clustering that can effectively capture the non-AON hyperedge-cluster relation. Our experiments show that PIC outperforms eight state-of-the-art methods on real-world hypergraphs in terms of both clustering quality and scalability and is up to five orders of magnitude faster than the baseline methods.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/3617335</doi><tpages>25</tpages><orcidid>https://orcid.org/0000-0002-9746-8253</orcidid><orcidid>https://orcid.org/0000-0002-4673-2587</orcidid><orcidid>https://orcid.org/0000-0001-8374-140X</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 2836-6573
ispartof Proceedings of the ACM on management of data, 2023-11, Vol.1 (3), p.1-25, Article 215
issn 2836-6573
2836-6573
language eng
recordid cdi_crossref_primary_10_1145_3617335
source ACM Digital Library Complete
subjects Cluster analysis
Clustering
Computing methodologies
Data mining
Discrete mathematics
Graph theory
Hypergraphs
Information systems
Information systems applications
Learning paradigms
Machine learning
Mathematics of computing
Unsupervised learning
title Modularity-based Hypergraph Clustering: Random Hypergraph Model, Hyperedge-cluster Relation, and Computation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T07%3A48%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Modularity-based%20Hypergraph%20Clustering:%20Random%20Hypergraph%20Model,%20Hyperedge-cluster%20Relation,%20and%20Computation&rft.jtitle=Proceedings%20of%20the%20ACM%20on%20management%20of%20data&rft.au=Feng,%20Zijin&rft.date=2023-11-13&rft.volume=1&rft.issue=3&rft.spage=1&rft.epage=25&rft.pages=1-25&rft.artnum=215&rft.issn=2836-6573&rft.eissn=2836-6573&rft_id=info:doi/10.1145/3617335&rft_dat=%3Cacm_cross%3E3617335%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true