A Deterministic Hitting-Time Moment Approach to Seed-set Expansion over a Graph
We introduce HITMIX, a new technique for network seed-set expansion, i.e., the problem of identifying a set of graph vertices related to a given seed-set of vertices. We use the moments of the graph's hitting-time distribution to quantify the relationship of each non-seed vertex to the seed-set...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We introduce HITMIX, a new technique for network seed-set expansion, i.e.,
the problem of identifying a set of graph vertices related to a given seed-set
of vertices. We use the moments of the graph's hitting-time distribution to
quantify the relationship of each non-seed vertex to the seed-set. This
involves a deterministic calculation for the hitting-time moments that is
scalable in the number of graph edges and so avoids directly sampling a Markov
chain over the graph. The moments are used to fit a mixture model to estimate
the probability that each non-seed vertex should be grouped with the seed set.
This membership probability enables us to sort the non-seeds and threshold in a
statistically-justified way. To the best of our knowledge, HITMIX is the first
full statistical model for seed-set expansion that can give vertex-level
membership probabilities. While HITMIX is a global method, its linear
computation complexity in practice enables computations on large graphs. We
have a high-performance implementation, and we present computational results on
stochastic blockmodels and a small-world network from the SNAP repository. The
state of the art in this problem is a collection of recently developed local
methods, and we show that distinct advantages in solution quality are available
if our global method can be used. In practice, we expect to be able to run
HITMIX if the graph can be stored in memory. |
---|---|
DOI: | 10.48550/arxiv.2011.09544 |