Bayesian identification of bots using temporal analysis of tweet storms

The key to identifying automated activity on social media is to isolate and analyze individual tweet storms that show how an account interacts with the twitterverse over time. In this work we propose the Dynamic Wavelet Fingerprint (DWFP) as a way to identify and flag this activity. Time-series repr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Social network analysis and mining 2021-12, Vol.11 (1), p.74, Article 74
Hauptverfasser: Kirn, Spencer Lee, Hinders, Mark K.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 1
container_start_page 74
container_title Social network analysis and mining
container_volume 11
creator Kirn, Spencer Lee
Hinders, Mark K.
description The key to identifying automated activity on social media is to isolate and analyze individual tweet storms that show how an account interacts with the twitterverse over time. In this work we propose the Dynamic Wavelet Fingerprint (DWFP) as a way to identify and flag this activity. Time-series representations of tweet storms are constructed using post metadata, and the DWFP converts these into binary images using a wavelet transform. To describe each tweet storm, features are extracted from the account metadata, tweet metadata, and DWFP images and then passed to a probabilistic classifier. We test three Bayesian Inference models: Multinomial Naïve Bayes, Gaussian Naïve Bayes, and Ensemble Naïve Bayes (ENB). Using Bayesian Inference structures allows us to propagate information between tweet storms by passing the posterior bot probability from one tweet storm as the prior assumption for the following tweet storm. For this proof-of-concept work we use a small, unambiguous dataset of 777 verified humans and 223 known bot accounts. We find the ENB model with four classifiers in the ensemble—decision tree, support vector machine, multi-layer perceptron, and logistic regression—provides the best results with a classification accuracy of 98.5%, and an f-score of 0.96 on the withheld validation data.
doi_str_mv 10.1007/s13278-021-00783-7
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2919613325</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2919613325</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-6e41387db4c98726a08d997da8d7760b85db706b46349e8c60e44553441818ad3</originalsourceid><addsrcrecordid>eNp9kE9LAzEQxYMoWGq_gKeA59XJJps_Ry1ahYIXPYfsbraktJuaSZH99m5d0ZunmWHeezx-hFwzuGUA6g4ZL5UuoGTFeGpeqDMyY1qaohLSnP_uFVySBeIWABhwbkDOyOrBDR6D62lofZ9DFxqXQ-xp7GgdM9Ijhn5Ds98fYnI76nq3GzDg6Z8_vc8Uc0x7vCIXnduhX_zMOXl_enxbPhfr19XL8n5dNJyZXEgvGNeqrUVjtCqlA90ao1qnW6Uk1LpqawWyFpIL43UjwQtRVVwIppl2LZ-Tmyn3kOLH0WO223hMYym0pWFGMs7LalSVk6pJETH5zh5S2Ls0WAb2xMxOzOzIzH4zs2o08cmEo7jf-PQX_Y_rC1OUbdc</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2919613325</pqid></control><display><type>article</type><title>Bayesian identification of bots using temporal analysis of tweet storms</title><source>ProQuest Central Essentials</source><source>ProQuest Central (Alumni Edition)</source><source>ProQuest Central Student</source><source>ProQuest Central Korea</source><source>ProQuest Central UK/Ireland</source><source>SpringerLink Journals - AutoHoldings</source><source>ProQuest Central</source><creator>Kirn, Spencer Lee ; Hinders, Mark K.</creator><creatorcontrib>Kirn, Spencer Lee ; Hinders, Mark K.</creatorcontrib><description>The key to identifying automated activity on social media is to isolate and analyze individual tweet storms that show how an account interacts with the twitterverse over time. In this work we propose the Dynamic Wavelet Fingerprint (DWFP) as a way to identify and flag this activity. Time-series representations of tweet storms are constructed using post metadata, and the DWFP converts these into binary images using a wavelet transform. To describe each tweet storm, features are extracted from the account metadata, tweet metadata, and DWFP images and then passed to a probabilistic classifier. We test three Bayesian Inference models: Multinomial Naïve Bayes, Gaussian Naïve Bayes, and Ensemble Naïve Bayes (ENB). Using Bayesian Inference structures allows us to propagate information between tweet storms by passing the posterior bot probability from one tweet storm as the prior assumption for the following tweet storm. For this proof-of-concept work we use a small, unambiguous dataset of 777 verified humans and 223 known bot accounts. We find the ENB model with four classifiers in the ensemble—decision tree, support vector machine, multi-layer perceptron, and logistic regression—provides the best results with a classification accuracy of 98.5%, and an f-score of 0.96 on the withheld validation data.</description><identifier>ISSN: 1869-5450</identifier><identifier>EISSN: 1869-5469</identifier><identifier>DOI: 10.1007/s13278-021-00783-7</identifier><language>eng</language><publisher>Vienna: Springer Vienna</publisher><subject>Algorithms ; Applications of Graph Theory and Complex Networks ; Automation ; Bayesian analysis ; Behavior ; Classifiers ; Computer Science ; Content creation ; Data Mining and Knowledge Discovery ; Datasets ; Decision trees ; Economics ; False information ; Game Theory ; Humanities ; Law ; Machine learning ; Metadata ; Methodology of the Social Sciences ; Multilayer perceptrons ; Multilayers ; Original Article ; Probability ; Social and Behav. Sciences ; Social media ; Social networks ; Statistical analysis ; Statistical inference ; Statistics for Social Sciences ; Storms ; Support vector machines ; Time series ; Time use ; Wavelet transforms</subject><ispartof>Social network analysis and mining, 2021-12, Vol.11 (1), p.74, Article 74</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature 2021</rights><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature 2021.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-6e41387db4c98726a08d997da8d7760b85db706b46349e8c60e44553441818ad3</citedby><cites>FETCH-LOGICAL-c319t-6e41387db4c98726a08d997da8d7760b85db706b46349e8c60e44553441818ad3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s13278-021-00783-7$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2919613325?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,21388,21389,21390,21391,23256,27924,27925,33530,33703,33744,34005,34314,41488,42557,43659,43787,43805,43953,44067,51319,64385,64389,72469</link.rule.ids></links><search><creatorcontrib>Kirn, Spencer Lee</creatorcontrib><creatorcontrib>Hinders, Mark K.</creatorcontrib><title>Bayesian identification of bots using temporal analysis of tweet storms</title><title>Social network analysis and mining</title><addtitle>Soc. Netw. Anal. Min</addtitle><description>The key to identifying automated activity on social media is to isolate and analyze individual tweet storms that show how an account interacts with the twitterverse over time. In this work we propose the Dynamic Wavelet Fingerprint (DWFP) as a way to identify and flag this activity. Time-series representations of tweet storms are constructed using post metadata, and the DWFP converts these into binary images using a wavelet transform. To describe each tweet storm, features are extracted from the account metadata, tweet metadata, and DWFP images and then passed to a probabilistic classifier. We test three Bayesian Inference models: Multinomial Naïve Bayes, Gaussian Naïve Bayes, and Ensemble Naïve Bayes (ENB). Using Bayesian Inference structures allows us to propagate information between tweet storms by passing the posterior bot probability from one tweet storm as the prior assumption for the following tweet storm. For this proof-of-concept work we use a small, unambiguous dataset of 777 verified humans and 223 known bot accounts. We find the ENB model with four classifiers in the ensemble—decision tree, support vector machine, multi-layer perceptron, and logistic regression—provides the best results with a classification accuracy of 98.5%, and an f-score of 0.96 on the withheld validation data.</description><subject>Algorithms</subject><subject>Applications of Graph Theory and Complex Networks</subject><subject>Automation</subject><subject>Bayesian analysis</subject><subject>Behavior</subject><subject>Classifiers</subject><subject>Computer Science</subject><subject>Content creation</subject><subject>Data Mining and Knowledge Discovery</subject><subject>Datasets</subject><subject>Decision trees</subject><subject>Economics</subject><subject>False information</subject><subject>Game Theory</subject><subject>Humanities</subject><subject>Law</subject><subject>Machine learning</subject><subject>Metadata</subject><subject>Methodology of the Social Sciences</subject><subject>Multilayer perceptrons</subject><subject>Multilayers</subject><subject>Original Article</subject><subject>Probability</subject><subject>Social and Behav. Sciences</subject><subject>Social media</subject><subject>Social networks</subject><subject>Statistical analysis</subject><subject>Statistical inference</subject><subject>Statistics for Social Sciences</subject><subject>Storms</subject><subject>Support vector machines</subject><subject>Time series</subject><subject>Time use</subject><subject>Wavelet transforms</subject><issn>1869-5450</issn><issn>1869-5469</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kE9LAzEQxYMoWGq_gKeA59XJJps_Ry1ahYIXPYfsbraktJuaSZH99m5d0ZunmWHeezx-hFwzuGUA6g4ZL5UuoGTFeGpeqDMyY1qaohLSnP_uFVySBeIWABhwbkDOyOrBDR6D62lofZ9DFxqXQ-xp7GgdM9Ijhn5Ds98fYnI76nq3GzDg6Z8_vc8Uc0x7vCIXnduhX_zMOXl_enxbPhfr19XL8n5dNJyZXEgvGNeqrUVjtCqlA90ao1qnW6Uk1LpqawWyFpIL43UjwQtRVVwIppl2LZ-Tmyn3kOLH0WO223hMYym0pWFGMs7LalSVk6pJETH5zh5S2Ls0WAb2xMxOzOzIzH4zs2o08cmEo7jf-PQX_Y_rC1OUbdc</recordid><startdate>20211201</startdate><enddate>20211201</enddate><creator>Kirn, Spencer Lee</creator><creator>Hinders, Mark K.</creator><general>Springer Vienna</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>0-V</scope><scope>3V.</scope><scope>7XB</scope><scope>88J</scope><scope>8BJ</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FQK</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JBE</scope><scope>JQ2</scope><scope>K7-</scope><scope>M2R</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope></search><sort><creationdate>20211201</creationdate><title>Bayesian identification of bots using temporal analysis of tweet storms</title><author>Kirn, Spencer Lee ; Hinders, Mark K.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-6e41387db4c98726a08d997da8d7760b85db706b46349e8c60e44553441818ad3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Applications of Graph Theory and Complex Networks</topic><topic>Automation</topic><topic>Bayesian analysis</topic><topic>Behavior</topic><topic>Classifiers</topic><topic>Computer Science</topic><topic>Content creation</topic><topic>Data Mining and Knowledge Discovery</topic><topic>Datasets</topic><topic>Decision trees</topic><topic>Economics</topic><topic>False information</topic><topic>Game Theory</topic><topic>Humanities</topic><topic>Law</topic><topic>Machine learning</topic><topic>Metadata</topic><topic>Methodology of the Social Sciences</topic><topic>Multilayer perceptrons</topic><topic>Multilayers</topic><topic>Original Article</topic><topic>Probability</topic><topic>Social and Behav. Sciences</topic><topic>Social media</topic><topic>Social networks</topic><topic>Statistical analysis</topic><topic>Statistical inference</topic><topic>Statistics for Social Sciences</topic><topic>Storms</topic><topic>Support vector machines</topic><topic>Time series</topic><topic>Time use</topic><topic>Wavelet transforms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kirn, Spencer Lee</creatorcontrib><creatorcontrib>Hinders, Mark K.</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Social Sciences Premium Collection</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Social Science Database (Alumni Edition)</collection><collection>International Bibliography of the Social Sciences (IBSS)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Social Science Premium Collection</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>International Bibliography of the Social Sciences</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>International Bibliography of the Social Sciences</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Social Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><jtitle>Social network analysis and mining</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kirn, Spencer Lee</au><au>Hinders, Mark K.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Bayesian identification of bots using temporal analysis of tweet storms</atitle><jtitle>Social network analysis and mining</jtitle><stitle>Soc. Netw. Anal. Min</stitle><date>2021-12-01</date><risdate>2021</risdate><volume>11</volume><issue>1</issue><spage>74</spage><pages>74-</pages><artnum>74</artnum><issn>1869-5450</issn><eissn>1869-5469</eissn><abstract>The key to identifying automated activity on social media is to isolate and analyze individual tweet storms that show how an account interacts with the twitterverse over time. In this work we propose the Dynamic Wavelet Fingerprint (DWFP) as a way to identify and flag this activity. Time-series representations of tweet storms are constructed using post metadata, and the DWFP converts these into binary images using a wavelet transform. To describe each tweet storm, features are extracted from the account metadata, tweet metadata, and DWFP images and then passed to a probabilistic classifier. We test three Bayesian Inference models: Multinomial Naïve Bayes, Gaussian Naïve Bayes, and Ensemble Naïve Bayes (ENB). Using Bayesian Inference structures allows us to propagate information between tweet storms by passing the posterior bot probability from one tweet storm as the prior assumption for the following tweet storm. For this proof-of-concept work we use a small, unambiguous dataset of 777 verified humans and 223 known bot accounts. We find the ENB model with four classifiers in the ensemble—decision tree, support vector machine, multi-layer perceptron, and logistic regression—provides the best results with a classification accuracy of 98.5%, and an f-score of 0.96 on the withheld validation data.</abstract><cop>Vienna</cop><pub>Springer Vienna</pub><doi>10.1007/s13278-021-00783-7</doi></addata></record>
fulltext fulltext
identifier ISSN: 1869-5450
ispartof Social network analysis and mining, 2021-12, Vol.11 (1), p.74, Article 74
issn 1869-5450
1869-5469
language eng
recordid cdi_proquest_journals_2919613325
source ProQuest Central Essentials; ProQuest Central (Alumni Edition); ProQuest Central Student; ProQuest Central Korea; ProQuest Central UK/Ireland; SpringerLink Journals - AutoHoldings; ProQuest Central
subjects Algorithms
Applications of Graph Theory and Complex Networks
Automation
Bayesian analysis
Behavior
Classifiers
Computer Science
Content creation
Data Mining and Knowledge Discovery
Datasets
Decision trees
Economics
False information
Game Theory
Humanities
Law
Machine learning
Metadata
Methodology of the Social Sciences
Multilayer perceptrons
Multilayers
Original Article
Probability
Social and Behav. Sciences
Social media
Social networks
Statistical analysis
Statistical inference
Statistics for Social Sciences
Storms
Support vector machines
Time series
Time use
Wavelet transforms
title Bayesian identification of bots using temporal analysis of tweet storms
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T01%3A24%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Bayesian%20identification%20of%20bots%20using%20temporal%20analysis%20of%20tweet%20storms&rft.jtitle=Social%20network%20analysis%20and%20mining&rft.au=Kirn,%20Spencer%20Lee&rft.date=2021-12-01&rft.volume=11&rft.issue=1&rft.spage=74&rft.pages=74-&rft.artnum=74&rft.issn=1869-5450&rft.eissn=1869-5469&rft_id=info:doi/10.1007/s13278-021-00783-7&rft_dat=%3Cproquest_cross%3E2919613325%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2919613325&rft_id=info:pmid/&rfr_iscdi=true