Comparing Methods for Creating a National Random Sample of Twitter Users

Twitter data has been widely used by researchers across various social and computer science disciplines. A common aim when working with Twitter data is the construction of a random sample of users from a given country. However, while several methods have been proposed in the literature, their compar...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Alizadeh, Meysam, Zare, Darya, Samei, Zeynab, Alizadeh, Mohammadamin, Kubli, Mael, Aliahmadi, Mohammadhadi, Ebrahimi, Sarvenaz, Gilardi, Fabrizio
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Social and Information Networks
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Twitter data has been widely used by researchers across various social and computer science disciplines. A common aim when working with Twitter data is the construction of a random sample of users from a given country. However, while several methods have been proposed in the literature, their comparative performance is mostly unexplored. In this paper, we implement four common methods to collect a random sample of Twitter users in the US: 1% Stream, Bounding Box, Location Query, and Language Query. Then, we compare the methods according to their tweet- and user-level metrics as well as their accuracy in estimating US population with and without using inclusion probabilities of various demographics. Our results show that the 1% Stream method performs differently than others in tweet- and user-level metrics, and best for the construction of a population representative sample. We discuss the conditions under which the 1% Stream method may not be suitable and suggest the Bounding Box method as the second-best method to use.
DOI:	10.48550/arxiv.2402.04879