Estimating network properties from snowball sampled data

► We present a technique to correct for the “degree-bias” of snowball sampled data. ► The correction technique is validated with monte carlo simulations. ► The estimation of the mean degree and the clustering coefficient is precise. ► The estimation of the degree correlation is not reliable. ► The c...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Social networks 2012-10, Vol.34 (4), p.701-711
Hauptverfasser: Illenberger, Johannes, Flötteröd, Gunnar
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:► We present a technique to correct for the “degree-bias” of snowball sampled data. ► The correction technique is validated with monte carlo simulations. ► The estimation of the mean degree and the clustering coefficient is precise. ► The estimation of the degree correlation is not reliable. ► The correction technique has no complex computational requirements. This article addresses the estimation of topological network parameters from data obtained with a snowball sampling design. An approximate expression for the probability of a vertex to be included in the sample is derived. Based on this sampling distribution, estimators for the mean degree, the degree correlation, and the clustering coefficient are proposed. The performance of these estimators and their sensitivity with respect to the response rate are validated through Monte Carlo simulations on several test networks. Our approach has no complex computational requirements and is straightforward to apply to real-world survey data. In a snowball sample design, each respondent is typically enquired only once. Different from the widely used estimator for Respondent-Driven Sampling (RDS), which assumes sampling with replacement, the proposed approach relies on sampling without replacement and is thus also applicable for large sample fractions. From the simulation experiments, we conclude that the estimation quality decreases with increasing variance of the network degree distribution. Yet, if the degree distribution is not to broad, our approach results in good estimates for the mean degree and the clustering coefficient, which, moreover, are almost independent from the response rate. The estimates for the degree correlation are of moderated quality.
ISSN:0378-8733
1879-2111
1879-2111
DOI:10.1016/j.socnet.2012.09.001