Exploring Behavioral Tendencies on Social Media: A Perspective Through Claim Check-Worthiness

Randomly Sampled Users Dataset (RSU.csv) This dataset consists of 11,173 users collected through Twitter's APIs. We collected 10,000 random English tweets in February 2023 using Twitter's Volume Stream API. The tweets were posted by around 3,000 users. For each user, we collected up to 1...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	The University of Texas at Arlington
Format:	Dataset
Sprache:	eng
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Randomly Sampled Users Dataset (RSU.csv) This dataset consists of 11,173 users collected through Twitter's APIs. We collected 10,000 random English tweets in February 2023 using Twitter's Volume Stream API. The tweets were posted by around 3,000 users. For each user, we collected up to 100 of its most recent followees using Twitter's Following API. Through the Timeline and Liking APIs, for each user, we collected their most recent tweets (up to 3,200 tweets due to Twitter's limit) and liked-tweets (up to 3,200 too). We then filtered out users that have insufficient tweets (less than 100 original-tweets or less than 80 retweets/liked-tweets) to ensure that the sample sizes are statistically significant in our analyses. Finally, we have 11,173 users along with 40,405,150 tweets. Humanities Dataset (HUM.csv) This dataset contains 341,285 tweets and 498 Twitter accounts from selected Twitter lists including Book Author, Christianity, Artists, Buddhism, Musician, and Philosophers. We use Twitter's List and Timeline APIs to collect the accounts and their most recent tweets (up to 1,000). The dataset was collected in January 2024. Politics Dataset (POL.csv) This dataset contains all tweets from selected U.S. news media and U.S. politicians including Senators, House Members, US Governors, US Secretaries of State, US Cabinet, and US Election Officials at collection time. We used Twitter's Timeline API to collect the accounts' tweets (up to 3,200 tweets). The dataset was collected in May 2023, with 8,153,745 tweets and 3,784 Twitter accounts. Data Fields Due to Twitter's content redistribution policy, we are only allowed to publish tweet IDs and user IDs. Therefore, in each dataset, each row/datapoint represent a tweet, containing two fields --- tweet_id and user_id.
DOI:	10.5281/zenodo.11081025