Hyper-distance oracles in hypergraphs

We study point-to-point distance estimation in hypergraphs, where the query is parameterized by a positive integer s , which defines the required level of overlap for two hyperedges to be considered adjacent. To answer s -distance queries, we first explore an oracle based on the line graph of the gi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The VLDB journal 2024-09, Vol.33 (5), p.1333-1356
Hauptverfasser: Preti, Giulia, De Francisci Morales, Gianmarco, Bonchi, Francesco
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We study point-to-point distance estimation in hypergraphs, where the query is parameterized by a positive integer s , which defines the required level of overlap for two hyperedges to be considered adjacent. To answer s -distance queries, we first explore an oracle based on the line graph of the given hypergraph and discuss its limitations: The line graph is typically orders of magnitude larger than the original hypergraph. We then introduce HypED , a landmark-based oracle with a predefined size, built directly on the hypergraph, thus avoiding the materialization of the line graph. Our framework allows to approximately answer vertex-to-vertex, vertex-to-hyperedge, and hyperedge-to-hyperedge s -distance queries for any value of s . A key observation at the basis of our framework is that as s increases, the hypergraph becomes more fragmented. We show how this can be exploited to improve the placement of landmarks, by identifying the s -connected components of the hypergraph. For this latter task, we devise an efficient algorithm based on the union-find technique and a dynamic inverted index. We experimentally evaluate HypED on several real-world hypergraphs and prove its versatility in answering s -distance queries for different values of s . Our framework allows answering such queries in fractions of a millisecond while allowing fine-grained control of the trade-off between index size and approximation error at creation time. Finally, we prove the usefulness of the s -distance oracle in two applications, namely hypergraph-based recommendation and the approximation of the s -closeness centrality of vertices and hyperedges in the context of protein-protein interactions.
ISSN:1066-8888
0949-877X
DOI:10.1007/s00778-024-00851-2