Exact Recovery in the General Hypergraph Stochastic Block Model

This paper investigates fundamental limits of exact recovery in the general d -uniform hypergraph stochastic block model ( d -HSBM), wherein n nodes are partitioned into k disjoint communities with relative sizes (p_{1},\ldots , p_{k}) . Each subset of nodes with cardinality d is generated i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on information theory 2023-01, Vol.69 (1), p.453-471
Hauptverfasser: Zhang, Qiaosheng, Tan, Vincent Y. F.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 471
container_issue 1
container_start_page 453
container_title IEEE transactions on information theory
container_volume 69
creator Zhang, Qiaosheng
Tan, Vincent Y. F.
description This paper investigates fundamental limits of exact recovery in the general d -uniform hypergraph stochastic block model ( d -HSBM), wherein n nodes are partitioned into k disjoint communities with relative sizes (p_{1},\ldots , p_{k}) . Each subset of nodes with cardinality d is generated independently as an order- d hyperedge with a certain probability that depends on the ground-truth communities that the d nodes belong to. The goal is to exactly recover the k hidden communities based on the observed hypergraph. We show that there exists a sharp threshold such that exact recovery is achievable above the threshold and impossible below the threshold (apart from a small regime of parameters that will be specified precisely). This threshold is represented in terms of a quantity which we term as the generalized Chernoff-Hellinger divergence between communities. Our result for this general model recovers prior results for the standard SBM and d -HSBM with two symmetric communities as special cases. En route to proving our achievability results, we develop a polynomial-time two-stage algorithm that meets the threshold. The first stage adopts a certain hypergraph spectral clustering method to obtain a coarse estimate of communities, and the second stage refines each node individually via local refinement steps to ensure exact recovery.
doi_str_mv 10.1109/TIT.2022.3205959
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2757181657</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9887955</ieee_id><sourcerecordid>2757181657</sourcerecordid><originalsourceid>FETCH-LOGICAL-c338t-b02538e6419b99e7ea786606e471e799770ef924fc2bd143db070a183cc2b3cb3</originalsourceid><addsrcrecordid>eNo9kE1LAzEQhoMoWKt3wUvA89Z8bpKTaKltoSJoPYdsOmu3rs2abMX-e7ds8TS8w_POwIPQNSUjSom5W86XI0YYG3FGpJHmBA2olCozuRSnaEAI1ZkRQp-ji5Q2XRSSsgG6n_w63-JX8OEH4h5XW9yuAU9hC9HVeLZvIH5E16zxWxv82qW28vixDv4TP4cV1JforHR1gqvjHKL3p8lyPMsWL9P5-GGRec51mxWESa4hF9QUxoACp3SekxyEoqCMUYpAaZgoPStWVPBVQRRxVHPfLbgv-BDd9nebGL53kFq7Cbu47V5apqSimuZSdRTpKR9DShFK28Tqy8W9pcQeNNlOkz1oskdNXeWmr1QA8I8brZWRkv8Bl3hh8A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2757181657</pqid></control><display><type>article</type><title>Exact Recovery in the General Hypergraph Stochastic Block Model</title><source>IEEE Electronic Library (IEL)</source><creator>Zhang, Qiaosheng ; Tan, Vincent Y. F.</creator><creatorcontrib>Zhang, Qiaosheng ; Tan, Vincent Y. F.</creatorcontrib><description><![CDATA[This paper investigates fundamental limits of exact recovery in the general <inline-formula> <tex-math notation="LaTeX">d </tex-math></inline-formula>-uniform hypergraph stochastic block model (<inline-formula> <tex-math notation="LaTeX">d </tex-math></inline-formula>-HSBM), wherein <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> nodes are partitioned into <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula> disjoint communities with relative sizes <inline-formula> <tex-math notation="LaTeX">(p_{1},\ldots , p_{k}) </tex-math></inline-formula>. Each subset of nodes with cardinality <inline-formula> <tex-math notation="LaTeX">d </tex-math></inline-formula> is generated independently as an order-<inline-formula> <tex-math notation="LaTeX">d </tex-math></inline-formula> hyperedge with a certain probability that depends on the ground-truth communities that the <inline-formula> <tex-math notation="LaTeX">d </tex-math></inline-formula> nodes belong to. The goal is to exactly recover the <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula> hidden communities based on the observed hypergraph. We show that there exists a sharp threshold such that exact recovery is achievable above the threshold and impossible below the threshold (apart from a small regime of parameters that will be specified precisely). This threshold is represented in terms of a quantity which we term as the generalized Chernoff-Hellinger divergence between communities. Our result for this general model recovers prior results for the standard SBM and <inline-formula> <tex-math notation="LaTeX">d </tex-math></inline-formula>-HSBM with two symmetric communities as special cases. En route to proving our achievability results, we develop a polynomial-time two-stage algorithm that meets the threshold. The first stage adopts a certain hypergraph spectral clustering method to obtain a coarse estimate of communities, and the second stage refines each node individually via local refinement steps to ensure exact recovery.]]></description><identifier>ISSN: 0018-9448</identifier><identifier>EISSN: 1557-9654</identifier><identifier>DOI: 10.1109/TIT.2022.3205959</identifier><identifier>CODEN: IETTAW</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Clustering ; Clustering algorithms ; Clustering methods ; Community detection ; Electronic mail ; exact recovery ; Graph theory ; Graphs ; hypergraph spectral clustering methods ; hypergraph stochastic block model (HSBM) ; Nodes ; Partitioning algorithms ; Polynomials ; Random variables ; Recovery ; Stochastic processes ; Tensors</subject><ispartof>IEEE transactions on information theory, 2023-01, Vol.69 (1), p.453-471</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c338t-b02538e6419b99e7ea786606e471e799770ef924fc2bd143db070a183cc2b3cb3</citedby><cites>FETCH-LOGICAL-c338t-b02538e6419b99e7ea786606e471e799770ef924fc2bd143db070a183cc2b3cb3</cites><orcidid>0000-0001-6114-8453 ; 0000-0002-5008-4527</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9887955$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9887955$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhang, Qiaosheng</creatorcontrib><creatorcontrib>Tan, Vincent Y. F.</creatorcontrib><title>Exact Recovery in the General Hypergraph Stochastic Block Model</title><title>IEEE transactions on information theory</title><addtitle>TIT</addtitle><description><![CDATA[This paper investigates fundamental limits of exact recovery in the general <inline-formula> <tex-math notation="LaTeX">d </tex-math></inline-formula>-uniform hypergraph stochastic block model (<inline-formula> <tex-math notation="LaTeX">d </tex-math></inline-formula>-HSBM), wherein <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> nodes are partitioned into <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula> disjoint communities with relative sizes <inline-formula> <tex-math notation="LaTeX">(p_{1},\ldots , p_{k}) </tex-math></inline-formula>. Each subset of nodes with cardinality <inline-formula> <tex-math notation="LaTeX">d </tex-math></inline-formula> is generated independently as an order-<inline-formula> <tex-math notation="LaTeX">d </tex-math></inline-formula> hyperedge with a certain probability that depends on the ground-truth communities that the <inline-formula> <tex-math notation="LaTeX">d </tex-math></inline-formula> nodes belong to. The goal is to exactly recover the <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula> hidden communities based on the observed hypergraph. We show that there exists a sharp threshold such that exact recovery is achievable above the threshold and impossible below the threshold (apart from a small regime of parameters that will be specified precisely). This threshold is represented in terms of a quantity which we term as the generalized Chernoff-Hellinger divergence between communities. Our result for this general model recovers prior results for the standard SBM and <inline-formula> <tex-math notation="LaTeX">d </tex-math></inline-formula>-HSBM with two symmetric communities as special cases. En route to proving our achievability results, we develop a polynomial-time two-stage algorithm that meets the threshold. The first stage adopts a certain hypergraph spectral clustering method to obtain a coarse estimate of communities, and the second stage refines each node individually via local refinement steps to ensure exact recovery.]]></description><subject>Algorithms</subject><subject>Clustering</subject><subject>Clustering algorithms</subject><subject>Clustering methods</subject><subject>Community detection</subject><subject>Electronic mail</subject><subject>exact recovery</subject><subject>Graph theory</subject><subject>Graphs</subject><subject>hypergraph spectral clustering methods</subject><subject>hypergraph stochastic block model (HSBM)</subject><subject>Nodes</subject><subject>Partitioning algorithms</subject><subject>Polynomials</subject><subject>Random variables</subject><subject>Recovery</subject><subject>Stochastic processes</subject><subject>Tensors</subject><issn>0018-9448</issn><issn>1557-9654</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1LAzEQhoMoWKt3wUvA89Z8bpKTaKltoSJoPYdsOmu3rs2abMX-e7ds8TS8w_POwIPQNSUjSom5W86XI0YYG3FGpJHmBA2olCozuRSnaEAI1ZkRQp-ji5Q2XRSSsgG6n_w63-JX8OEH4h5XW9yuAU9hC9HVeLZvIH5E16zxWxv82qW28vixDv4TP4cV1JforHR1gqvjHKL3p8lyPMsWL9P5-GGRec51mxWESa4hF9QUxoACp3SekxyEoqCMUYpAaZgoPStWVPBVQRRxVHPfLbgv-BDd9nebGL53kFq7Cbu47V5apqSimuZSdRTpKR9DShFK28Tqy8W9pcQeNNlOkz1oskdNXeWmr1QA8I8brZWRkv8Bl3hh8A</recordid><startdate>202301</startdate><enddate>202301</enddate><creator>Zhang, Qiaosheng</creator><creator>Tan, Vincent Y. F.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-6114-8453</orcidid><orcidid>https://orcid.org/0000-0002-5008-4527</orcidid></search><sort><creationdate>202301</creationdate><title>Exact Recovery in the General Hypergraph Stochastic Block Model</title><author>Zhang, Qiaosheng ; Tan, Vincent Y. F.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c338t-b02538e6419b99e7ea786606e471e799770ef924fc2bd143db070a183cc2b3cb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Clustering</topic><topic>Clustering algorithms</topic><topic>Clustering methods</topic><topic>Community detection</topic><topic>Electronic mail</topic><topic>exact recovery</topic><topic>Graph theory</topic><topic>Graphs</topic><topic>hypergraph spectral clustering methods</topic><topic>hypergraph stochastic block model (HSBM)</topic><topic>Nodes</topic><topic>Partitioning algorithms</topic><topic>Polynomials</topic><topic>Random variables</topic><topic>Recovery</topic><topic>Stochastic processes</topic><topic>Tensors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Qiaosheng</creatorcontrib><creatorcontrib>Tan, Vincent Y. F.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on information theory</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Qiaosheng</au><au>Tan, Vincent Y. F.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Exact Recovery in the General Hypergraph Stochastic Block Model</atitle><jtitle>IEEE transactions on information theory</jtitle><stitle>TIT</stitle><date>2023-01</date><risdate>2023</risdate><volume>69</volume><issue>1</issue><spage>453</spage><epage>471</epage><pages>453-471</pages><issn>0018-9448</issn><eissn>1557-9654</eissn><coden>IETTAW</coden><abstract><![CDATA[This paper investigates fundamental limits of exact recovery in the general <inline-formula> <tex-math notation="LaTeX">d </tex-math></inline-formula>-uniform hypergraph stochastic block model (<inline-formula> <tex-math notation="LaTeX">d </tex-math></inline-formula>-HSBM), wherein <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> nodes are partitioned into <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula> disjoint communities with relative sizes <inline-formula> <tex-math notation="LaTeX">(p_{1},\ldots , p_{k}) </tex-math></inline-formula>. Each subset of nodes with cardinality <inline-formula> <tex-math notation="LaTeX">d </tex-math></inline-formula> is generated independently as an order-<inline-formula> <tex-math notation="LaTeX">d </tex-math></inline-formula> hyperedge with a certain probability that depends on the ground-truth communities that the <inline-formula> <tex-math notation="LaTeX">d </tex-math></inline-formula> nodes belong to. The goal is to exactly recover the <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula> hidden communities based on the observed hypergraph. We show that there exists a sharp threshold such that exact recovery is achievable above the threshold and impossible below the threshold (apart from a small regime of parameters that will be specified precisely). This threshold is represented in terms of a quantity which we term as the generalized Chernoff-Hellinger divergence between communities. Our result for this general model recovers prior results for the standard SBM and <inline-formula> <tex-math notation="LaTeX">d </tex-math></inline-formula>-HSBM with two symmetric communities as special cases. En route to proving our achievability results, we develop a polynomial-time two-stage algorithm that meets the threshold. The first stage adopts a certain hypergraph spectral clustering method to obtain a coarse estimate of communities, and the second stage refines each node individually via local refinement steps to ensure exact recovery.]]></abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TIT.2022.3205959</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0001-6114-8453</orcidid><orcidid>https://orcid.org/0000-0002-5008-4527</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0018-9448
ispartof IEEE transactions on information theory, 2023-01, Vol.69 (1), p.453-471
issn 0018-9448
1557-9654
language eng
recordid cdi_proquest_journals_2757181657
source IEEE Electronic Library (IEL)
subjects Algorithms
Clustering
Clustering algorithms
Clustering methods
Community detection
Electronic mail
exact recovery
Graph theory
Graphs
hypergraph spectral clustering methods
hypergraph stochastic block model (HSBM)
Nodes
Partitioning algorithms
Polynomials
Random variables
Recovery
Stochastic processes
Tensors
title Exact Recovery in the General Hypergraph Stochastic Block Model
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T15%3A47%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Exact%20Recovery%20in%20the%20General%20Hypergraph%20Stochastic%20Block%20Model&rft.jtitle=IEEE%20transactions%20on%20information%20theory&rft.au=Zhang,%20Qiaosheng&rft.date=2023-01&rft.volume=69&rft.issue=1&rft.spage=453&rft.epage=471&rft.pages=453-471&rft.issn=0018-9448&rft.eissn=1557-9654&rft.coden=IETTAW&rft_id=info:doi/10.1109/TIT.2022.3205959&rft_dat=%3Cproquest_RIE%3E2757181657%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2757181657&rft_id=info:pmid/&rft_ieee_id=9887955&rfr_iscdi=true