Comparing Methods for Creating a National Random Sample of Twitter Users
Twitter data has been widely used by researchers across various social and computer science disciplines. A common aim when working with Twitter data is the construction of a random sample of users from a given country. However, while several methods have been proposed in the literature, their compar...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Twitter data has been widely used by researchers across various social and
computer science disciplines. A common aim when working with Twitter data is
the construction of a random sample of users from a given country. However,
while several methods have been proposed in the literature, their comparative
performance is mostly unexplored. In this paper, we implement four common
methods to collect a random sample of Twitter users in the US: 1% Stream,
Bounding Box, Location Query, and Language Query. Then, we compare the methods
according to their tweet- and user-level metrics as well as their accuracy in
estimating US population with and without using inclusion probabilities of
various demographics. Our results show that the 1% Stream method performs
differently than others in tweet- and user-level metrics, and best for the
construction of a population representative sample. We discuss the conditions
under which the 1% Stream method may not be suitable and suggest the Bounding
Box method as the second-best method to use. |
---|---|
DOI: | 10.48550/arxiv.2402.04879 |