Waddling Random Walk: Fast and Accurate Mining of Motif Statistics in Large Graphs
Algorithms for mining very large graphs, such as those representing online social networks, to discover the relative frequency of small subgraphs within them are of high interest to sociologists, computer scientists and marketeers alike. However, the computation of these network motif statistics via...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Algorithms for mining very large graphs, such as those representing online
social networks, to discover the relative frequency of small subgraphs within
them are of high interest to sociologists, computer scientists and marketeers
alike. However, the computation of these network motif statistics via naive
enumeration is infeasible for either its prohibitive computational costs or
access restrictions on the full graph data. Methods to estimate the motif
statistics based on random walks by sampling only a small fraction of the
subgraphs in the large graph address both of these challenges. In this paper,
we present a new algorithm, called the Waddling Random Walk (WRW), which
estimates the concentration of motifs of any size. It derives its name from the
fact that it sways a little to the left and to the right, thus also sampling
nodes not directly on the path of the random walk. The WRW algorithm achieves
its computational efficiency by not trying to enumerate subgraphs around the
random walk but instead using a randomized protocol to sample subgraphs in the
neighborhood of the nodes visited by the walk. In addition, WRW achieves
significantly higher accuracy (measured by the closeness of its estimate to the
correct value) and higher precision (measured by the low variance in its
estimations) than the current state-of-the-art algorithms for mining subgraph
statistics. We illustrate these advantages in speed, accuracy and precision
using simulations on well-known and widely used graph datasets representing
real networks. |
---|---|
DOI: | 10.48550/arxiv.1605.09776 |