Modularity-based Hypergraph Clustering: Random Hypergraph Model, Hyperedge-cluster Relation, and Computation

A graph models the connections among objects. One important graph analytical task is clustering which partitions a data graph into clusters with dense innercluster connections. A line of clustering maximizes a function called modularity. Modularity-based clustering is widely adopted on dyadic graphs...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings of the ACM on management of data 2023-11, Vol.1 (3), p.1-25, Article 215
Hauptverfasser:	Feng, Zijin, Qiao, Miao, Cheng, Hong
Format:	Artikel
Sprache:	eng
Schlagworte:	Cluster analysis Clustering Computing methodologies Data mining Discrete mathematics Graph theory Hypergraphs Information systems Information systems applications Learning paradigms Machine learning Mathematics of computing Unsupervised learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	25
container_issue	3
container_start_page	1
container_title	Proceedings of the ACM on management of data
container_volume	1
creator	Feng, Zijin Qiao, Miao Cheng, Hong
description	A graph models the connections among objects. One important graph analytical task is clustering which partitions a data graph into clusters with dense innercluster connections. A line of clustering maximizes a function called modularity. Modularity-based clustering is widely adopted on dyadic graphs due to its scalability and clustering quality which depends highly on its selection of a random graph model. The random graph model decides not only which clustering is preferred - modularity measures the quality of a clustering based on its alignment to the edges of a random graph, but also the cost of computing such an alignment. Existing random hypergraph models either measure the hyperedge-cluster alignment in an All-Or-Nothing (AON) manner, losing important group-wise information, or introduce expensive alignment computation, refraining the clustering from scaling up. This paper proposes a new random hypergraph model called Hyperedge Expansion Model (HEM), a non-AON hypergraph modularity function called Partial Innerclusteredge modularity (PI) based on HEM, a clustering algorithm called Partial Innerclusteredge Clustering (PIC) that optimizes PI, and novel computation optimizations. PIC is a scalable modularity-based hypergraph clustering that can effectively capture the non-AON hyperedge-cluster relation. Our experiments show that PIC outperforms eight state-of-the-art methods on real-world hypergraphs in terms of both clustering quality and scalability and is up to five orders of magnitude faster than the baseline methods.
doi_str_mv	10.1145/3617335
format	Article
fullrecord	<record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3617335</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3617335</sourcerecordid><originalsourceid>FETCH-LOGICAL-a515-9f7a39972423e6cb5291b64f88e64ba0f5f50771d7123482fbde2acb8c7098293</originalsourceid><addsrcrecordid>eNpNkL1PwzAUxC0EElWp2JmysTTgj9iO2VBEKVIRUtU9cpznEOR8yE6G_PcUUlCnd7r73RsOoVuCHwhJ-CMTRDLGL9CCpkzEgkt2eaav0SqEL4wxVYIRJRbIvXfl6LSvhykudIAy2k49-Mrr_jPK3BgG8HVbPUV73ZZdc54em-DWswNlBbGZ8WgPTg91166jYyfKuqYfh1_jBl1Z7QKsTneJDpuXQ7aNdx-vb9nzLtac8FhZqZlSkiaUgTAFp4oUIrFpCiIpNLbcciwlKSWhLEmpLUqg2hSpkVilVLElup_fGt-F4MHmva8b7aec4Pxnpvw005G8m0ltmn_oL_wG01tirQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Modularity-based Hypergraph Clustering: Random Hypergraph Model, Hyperedge-cluster Relation, and Computation</title><source>ACM Digital Library Complete</source><creator>Feng, Zijin ; Qiao, Miao ; Cheng, Hong</creator><creatorcontrib>Feng, Zijin ; Qiao, Miao ; Cheng, Hong</creatorcontrib><description>A graph models the connections among objects. One important graph analytical task is clustering which partitions a data graph into clusters with dense innercluster connections. A line of clustering maximizes a function called modularity. Modularity-based clustering is widely adopted on dyadic graphs due to its scalability and clustering quality which depends highly on its selection of a random graph model. The random graph model decides not only which clustering is preferred - modularity measures the quality of a clustering based on its alignment to the edges of a random graph, but also the cost of computing such an alignment. Existing random hypergraph models either measure the hyperedge-cluster alignment in an All-Or-Nothing (AON) manner, losing important group-wise information, or introduce expensive alignment computation, refraining the clustering from scaling up. This paper proposes a new random hypergraph model called Hyperedge Expansion Model (HEM), a non-AON hypergraph modularity function called Partial Innerclusteredge modularity (PI) based on HEM, a clustering algorithm called Partial Innerclusteredge Clustering (PIC) that optimizes PI, and novel computation optimizations. PIC is a scalable modularity-based hypergraph clustering that can effectively capture the non-AON hyperedge-cluster relation. Our experiments show that PIC outperforms eight state-of-the-art methods on real-world hypergraphs in terms of both clustering quality and scalability and is up to five orders of magnitude faster than the baseline methods.</description><identifier>ISSN: 2836-6573</identifier><identifier>EISSN: 2836-6573</identifier><identifier>DOI: 10.1145/3617335</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Cluster analysis ; Clustering ; Computing methodologies ; Data mining ; Discrete mathematics ; Graph theory ; Hypergraphs ; Information systems ; Information systems applications ; Learning paradigms ; Machine learning ; Mathematics of computing ; Unsupervised learning</subject><ispartof>Proceedings of the ACM on management of data, 2023-11, Vol.1 (3), p.1-25, Article 215</ispartof><rights>ACM</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-a515-9f7a39972423e6cb5291b64f88e64ba0f5f50771d7123482fbde2acb8c7098293</cites><orcidid>0000-0002-9746-8253 ; 0000-0002-4673-2587 ; 0000-0001-8374-140X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3617335$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>314,776,780,2275,27903,27904,40175,75974</link.rule.ids></links><search><creatorcontrib>Feng, Zijin</creatorcontrib><creatorcontrib>Qiao, Miao</creatorcontrib><creatorcontrib>Cheng, Hong</creatorcontrib><title>Modularity-based Hypergraph Clustering: Random Hypergraph Model, Hyperedge-cluster Relation, and Computation</title><title>Proceedings of the ACM on management of data</title><addtitle>ACM PACMMOD</addtitle><description>A graph models the connections among objects. One important graph analytical task is clustering which partitions a data graph into clusters with dense innercluster connections. A line of clustering maximizes a function called modularity. Modularity-based clustering is widely adopted on dyadic graphs due to its scalability and clustering quality which depends highly on its selection of a random graph model. The random graph model decides not only which clustering is preferred - modularity measures the quality of a clustering based on its alignment to the edges of a random graph, but also the cost of computing such an alignment. Existing random hypergraph models either measure the hyperedge-cluster alignment in an All-Or-Nothing (AON) manner, losing important group-wise information, or introduce expensive alignment computation, refraining the clustering from scaling up. This paper proposes a new random hypergraph model called Hyperedge Expansion Model (HEM), a non-AON hypergraph modularity function called Partial Innerclusteredge modularity (PI) based on HEM, a clustering algorithm called Partial Innerclusteredge Clustering (PIC) that optimizes PI, and novel computation optimizations. PIC is a scalable modularity-based hypergraph clustering that can effectively capture the non-AON hyperedge-cluster relation. Our experiments show that PIC outperforms eight state-of-the-art methods on real-world hypergraphs in terms of both clustering quality and scalability and is up to five orders of magnitude faster than the baseline methods.</description><subject>Cluster analysis</subject><subject>Clustering</subject><subject>Computing methodologies</subject><subject>Data mining</subject><subject>Discrete mathematics</subject><subject>Graph theory</subject><subject>Hypergraphs</subject><subject>Information systems</subject><subject>Information systems applications</subject><subject>Learning paradigms</subject><subject>Machine learning</subject><subject>Mathematics of computing</subject><subject>Unsupervised learning</subject><issn>2836-6573</issn><issn>2836-6573</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNpNkL1PwzAUxC0EElWp2JmysTTgj9iO2VBEKVIRUtU9cpznEOR8yE6G_PcUUlCnd7r73RsOoVuCHwhJ-CMTRDLGL9CCpkzEgkt2eaav0SqEL4wxVYIRJRbIvXfl6LSvhykudIAy2k49-Mrr_jPK3BgG8HVbPUV73ZZdc54em-DWswNlBbGZ8WgPTg91166jYyfKuqYfh1_jBl1Z7QKsTneJDpuXQ7aNdx-vb9nzLtac8FhZqZlSkiaUgTAFp4oUIrFpCiIpNLbcciwlKSWhLEmpLUqg2hSpkVilVLElup_fGt-F4MHmva8b7aec4Pxnpvw005G8m0ltmn_oL_wG01tirQ</recordid><startdate>20231113</startdate><enddate>20231113</enddate><creator>Feng, Zijin</creator><creator>Qiao, Miao</creator><creator>Cheng, Hong</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-9746-8253</orcidid><orcidid>https://orcid.org/0000-0002-4673-2587</orcidid><orcidid>https://orcid.org/0000-0001-8374-140X</orcidid></search><sort><creationdate>20231113</creationdate><title>Modularity-based Hypergraph Clustering: Random Hypergraph Model, Hyperedge-cluster Relation, and Computation</title><author>Feng, Zijin ; Qiao, Miao ; Cheng, Hong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a515-9f7a39972423e6cb5291b64f88e64ba0f5f50771d7123482fbde2acb8c7098293</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Cluster analysis</topic><topic>Clustering</topic><topic>Computing methodologies</topic><topic>Data mining</topic><topic>Discrete mathematics</topic><topic>Graph theory</topic><topic>Hypergraphs</topic><topic>Information systems</topic><topic>Information systems applications</topic><topic>Learning paradigms</topic><topic>Machine learning</topic><topic>Mathematics of computing</topic><topic>Unsupervised learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Feng, Zijin</creatorcontrib><creatorcontrib>Qiao, Miao</creatorcontrib><creatorcontrib>Cheng, Hong</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of the ACM on management of data</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Feng, Zijin</au><au>Qiao, Miao</au><au>Cheng, Hong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Modularity-based Hypergraph Clustering: Random Hypergraph Model, Hyperedge-cluster Relation, and Computation</atitle><jtitle>Proceedings of the ACM on management of data</jtitle><stitle>ACM PACMMOD</stitle><date>2023-11-13</date><risdate>2023</risdate><volume>1</volume><issue>3</issue><spage>1</spage><epage>25</epage><pages>1-25</pages><artnum>215</artnum><issn>2836-6573</issn><eissn>2836-6573</eissn><abstract>A graph models the connections among objects. One important graph analytical task is clustering which partitions a data graph into clusters with dense innercluster connections. A line of clustering maximizes a function called modularity. Modularity-based clustering is widely adopted on dyadic graphs due to its scalability and clustering quality which depends highly on its selection of a random graph model. The random graph model decides not only which clustering is preferred - modularity measures the quality of a clustering based on its alignment to the edges of a random graph, but also the cost of computing such an alignment. Existing random hypergraph models either measure the hyperedge-cluster alignment in an All-Or-Nothing (AON) manner, losing important group-wise information, or introduce expensive alignment computation, refraining the clustering from scaling up. This paper proposes a new random hypergraph model called Hyperedge Expansion Model (HEM), a non-AON hypergraph modularity function called Partial Innerclusteredge modularity (PI) based on HEM, a clustering algorithm called Partial Innerclusteredge Clustering (PIC) that optimizes PI, and novel computation optimizations. PIC is a scalable modularity-based hypergraph clustering that can effectively capture the non-AON hyperedge-cluster relation. Our experiments show that PIC outperforms eight state-of-the-art methods on real-world hypergraphs in terms of both clustering quality and scalability and is up to five orders of magnitude faster than the baseline methods.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/3617335</doi><tpages>25</tpages><orcidid>https://orcid.org/0000-0002-9746-8253</orcidid><orcidid>https://orcid.org/0000-0002-4673-2587</orcidid><orcidid>https://orcid.org/0000-0001-8374-140X</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 2836-6573
ispartof	Proceedings of the ACM on management of data, 2023-11, Vol.1 (3), p.1-25, Article 215
issn	2836-6573 2836-6573
language	eng
recordid	cdi_crossref_primary_10_1145_3617335
source	ACM Digital Library Complete
subjects	Cluster analysis Clustering Computing methodologies Data mining Discrete mathematics Graph theory Hypergraphs Information systems Information systems applications Learning paradigms Machine learning Mathematics of computing Unsupervised learning
title	Modularity-based Hypergraph Clustering: Random Hypergraph Model, Hyperedge-cluster Relation, and Computation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T07%3A48%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Modularity-based%20Hypergraph%20Clustering:%20Random%20Hypergraph%20Model,%20Hyperedge-cluster%20Relation,%20and%20Computation&rft.jtitle=Proceedings%20of%20the%20ACM%20on%20management%20of%20data&rft.au=Feng,%20Zijin&rft.date=2023-11-13&rft.volume=1&rft.issue=3&rft.spage=1&rft.epage=25&rft.pages=1-25&rft.artnum=215&rft.issn=2836-6573&rft.eissn=2836-6573&rft_id=info:doi/10.1145/3617335&rft_dat=%3Cacm_cross%3E3617335%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true