Bayesian identification of bots using temporal analysis of tweet storms

The key to identifying automated activity on social media is to isolate and analyze individual tweet storms that show how an account interacts with the twitterverse over time. In this work we propose the Dynamic Wavelet Fingerprint (DWFP) as a way to identify and flag this activity. Time-series repr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Social network analysis and mining 2021-12, Vol.11 (1), p.74, Article 74
Hauptverfasser: Kirn, Spencer Lee, Hinders, Mark K.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The key to identifying automated activity on social media is to isolate and analyze individual tweet storms that show how an account interacts with the twitterverse over time. In this work we propose the Dynamic Wavelet Fingerprint (DWFP) as a way to identify and flag this activity. Time-series representations of tweet storms are constructed using post metadata, and the DWFP converts these into binary images using a wavelet transform. To describe each tweet storm, features are extracted from the account metadata, tweet metadata, and DWFP images and then passed to a probabilistic classifier. We test three Bayesian Inference models: Multinomial Naïve Bayes, Gaussian Naïve Bayes, and Ensemble Naïve Bayes (ENB). Using Bayesian Inference structures allows us to propagate information between tweet storms by passing the posterior bot probability from one tweet storm as the prior assumption for the following tweet storm. For this proof-of-concept work we use a small, unambiguous dataset of 777 verified humans and 223 known bot accounts. We find the ENB model with four classifiers in the ensemble—decision tree, support vector machine, multi-layer perceptron, and logistic regression—provides the best results with a classification accuracy of 98.5%, and an f-score of 0.96 on the withheld validation data.
ISSN:1869-5450
1869-5469
DOI:10.1007/s13278-021-00783-7