Exploring Behavioral Tendencies on Social Media: A Perspective Through Claim Check-Worthiness
Randomly Sampled Users Dataset (RSU.csv) This dataset consists of 11,173 users collected through Twitter's APIs. We collected 10,000 random English tweets in February 2023 using Twitter's Volume Stream API. The tweets were posted by around 3,000 users. For each user, we collected up to 1...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Dataset |
Sprache: | eng |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Randomly Sampled Users Dataset (RSU.csv)
This dataset consists of 11,173 users collected through Twitter's APIs. We collected 10,000 random English tweets in February 2023 using Twitter's Volume Stream API. The tweets were posted by around 3,000 users. For each user, we collected up to 100 of its most recent followees using Twitter's Following API. Through the Timeline and Liking APIs, for each user, we collected their most recent tweets (up to 3,200 tweets due to Twitter's limit) and liked-tweets (up to 3,200 too). We then filtered out users that have insufficient tweets (less than 100 original-tweets or less than 80 retweets/liked-tweets) to ensure that the sample sizes are statistically significant in our analyses. Finally, we have 11,173 users along with 40,405,150 tweets.
Humanities Dataset (HUM.csv)
This dataset contains 341,285 tweets and 498 Twitter accounts from selected Twitter lists including Book Author, Christianity, Artists, Buddhism, Musician, and Philosophers. We use Twitter's List and Timeline APIs to collect the accounts and their most recent tweets (up to 1,000). The dataset was collected in January 2024.
Politics Dataset (POL.csv)
This dataset contains all tweets from selected U.S. news media and U.S. politicians including Senators, House Members, US Governors, US Secretaries of State, US Cabinet, and US Election Officials at collection time. We used Twitter's Timeline API to collect the accounts' tweets (up to 3,200 tweets). The dataset was collected in May 2023, with 8,153,745 tweets and 3,784 Twitter accounts.
Data Fields
Due to Twitter's content redistribution policy, we are only allowed to publish tweet IDs and user IDs. Therefore, in each dataset, each row/datapoint represent a tweet, containing two fields --- tweet_id and user_id. |
---|---|
DOI: | 10.5281/zenodo.11081025 |