Bayesian identification of bots using temporal analysis of tweet storms
The key to identifying automated activity on social media is to isolate and analyze individual tweet storms that show how an account interacts with the twitterverse over time. In this work we propose the Dynamic Wavelet Fingerprint (DWFP) as a way to identify and flag this activity. Time-series repr...
Gespeichert in:
Veröffentlicht in: | Social network analysis and mining 2021-12, Vol.11 (1), p.74, Article 74 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 1 |
container_start_page | 74 |
container_title | Social network analysis and mining |
container_volume | 11 |
creator | Kirn, Spencer Lee Hinders, Mark K. |
description | The key to identifying automated activity on social media is to isolate and analyze individual tweet storms that show how an account interacts with the twitterverse over time. In this work we propose the Dynamic Wavelet Fingerprint (DWFP) as a way to identify and flag this activity. Time-series representations of tweet storms are constructed using post metadata, and the DWFP converts these into binary images using a wavelet transform. To describe each tweet storm, features are extracted from the account metadata, tweet metadata, and DWFP images and then passed to a probabilistic classifier. We test three Bayesian Inference models: Multinomial Naïve Bayes, Gaussian Naïve Bayes, and Ensemble Naïve Bayes (ENB). Using Bayesian Inference structures allows us to propagate information between tweet storms by passing the posterior bot probability from one tweet storm as the prior assumption for the following tweet storm. For this proof-of-concept work we use a small, unambiguous dataset of 777 verified humans and 223 known bot accounts. We find the ENB model with four classifiers in the ensemble—decision tree, support vector machine, multi-layer perceptron, and logistic regression—provides the best results with a classification accuracy of 98.5%, and an f-score of 0.96 on the withheld validation data. |
doi_str_mv | 10.1007/s13278-021-00783-7 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2919613325</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2919613325</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-6e41387db4c98726a08d997da8d7760b85db706b46349e8c60e44553441818ad3</originalsourceid><addsrcrecordid>eNp9kE9LAzEQxYMoWGq_gKeA59XJJps_Ry1ahYIXPYfsbraktJuaSZH99m5d0ZunmWHeezx-hFwzuGUA6g4ZL5UuoGTFeGpeqDMyY1qaohLSnP_uFVySBeIWABhwbkDOyOrBDR6D62lofZ9DFxqXQ-xp7GgdM9Ijhn5Ds98fYnI76nq3GzDg6Z8_vc8Uc0x7vCIXnduhX_zMOXl_enxbPhfr19XL8n5dNJyZXEgvGNeqrUVjtCqlA90ao1qnW6Uk1LpqawWyFpIL43UjwQtRVVwIppl2LZ-Tmyn3kOLH0WO223hMYym0pWFGMs7LalSVk6pJETH5zh5S2Ls0WAb2xMxOzOzIzH4zs2o08cmEo7jf-PQX_Y_rC1OUbdc</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2919613325</pqid></control><display><type>article</type><title>Bayesian identification of bots using temporal analysis of tweet storms</title><source>ProQuest Central Essentials</source><source>ProQuest Central (Alumni Edition)</source><source>ProQuest Central Student</source><source>ProQuest Central Korea</source><source>ProQuest Central UK/Ireland</source><source>SpringerLink Journals - AutoHoldings</source><source>ProQuest Central</source><creator>Kirn, Spencer Lee ; Hinders, Mark K.</creator><creatorcontrib>Kirn, Spencer Lee ; Hinders, Mark K.</creatorcontrib><description>The key to identifying automated activity on social media is to isolate and analyze individual tweet storms that show how an account interacts with the twitterverse over time. In this work we propose the Dynamic Wavelet Fingerprint (DWFP) as a way to identify and flag this activity. Time-series representations of tweet storms are constructed using post metadata, and the DWFP converts these into binary images using a wavelet transform. To describe each tweet storm, features are extracted from the account metadata, tweet metadata, and DWFP images and then passed to a probabilistic classifier. We test three Bayesian Inference models: Multinomial Naïve Bayes, Gaussian Naïve Bayes, and Ensemble Naïve Bayes (ENB). Using Bayesian Inference structures allows us to propagate information between tweet storms by passing the posterior bot probability from one tweet storm as the prior assumption for the following tweet storm. For this proof-of-concept work we use a small, unambiguous dataset of 777 verified humans and 223 known bot accounts. We find the ENB model with four classifiers in the ensemble—decision tree, support vector machine, multi-layer perceptron, and logistic regression—provides the best results with a classification accuracy of 98.5%, and an f-score of 0.96 on the withheld validation data.</description><identifier>ISSN: 1869-5450</identifier><identifier>EISSN: 1869-5469</identifier><identifier>DOI: 10.1007/s13278-021-00783-7</identifier><language>eng</language><publisher>Vienna: Springer Vienna</publisher><subject>Algorithms ; Applications of Graph Theory and Complex Networks ; Automation ; Bayesian analysis ; Behavior ; Classifiers ; Computer Science ; Content creation ; Data Mining and Knowledge Discovery ; Datasets ; Decision trees ; Economics ; False information ; Game Theory ; Humanities ; Law ; Machine learning ; Metadata ; Methodology of the Social Sciences ; Multilayer perceptrons ; Multilayers ; Original Article ; Probability ; Social and Behav. Sciences ; Social media ; Social networks ; Statistical analysis ; Statistical inference ; Statistics for Social Sciences ; Storms ; Support vector machines ; Time series ; Time use ; Wavelet transforms</subject><ispartof>Social network analysis and mining, 2021-12, Vol.11 (1), p.74, Article 74</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature 2021</rights><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature 2021.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-6e41387db4c98726a08d997da8d7760b85db706b46349e8c60e44553441818ad3</citedby><cites>FETCH-LOGICAL-c319t-6e41387db4c98726a08d997da8d7760b85db706b46349e8c60e44553441818ad3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s13278-021-00783-7$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2919613325?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,21388,21389,21390,21391,23256,27924,27925,33530,33703,33744,34005,34314,41488,42557,43659,43787,43805,43953,44067,51319,64385,64389,72469</link.rule.ids></links><search><creatorcontrib>Kirn, Spencer Lee</creatorcontrib><creatorcontrib>Hinders, Mark K.</creatorcontrib><title>Bayesian identification of bots using temporal analysis of tweet storms</title><title>Social network analysis and mining</title><addtitle>Soc. Netw. Anal. Min</addtitle><description>The key to identifying automated activity on social media is to isolate and analyze individual tweet storms that show how an account interacts with the twitterverse over time. In this work we propose the Dynamic Wavelet Fingerprint (DWFP) as a way to identify and flag this activity. Time-series representations of tweet storms are constructed using post metadata, and the DWFP converts these into binary images using a wavelet transform. To describe each tweet storm, features are extracted from the account metadata, tweet metadata, and DWFP images and then passed to a probabilistic classifier. We test three Bayesian Inference models: Multinomial Naïve Bayes, Gaussian Naïve Bayes, and Ensemble Naïve Bayes (ENB). Using Bayesian Inference structures allows us to propagate information between tweet storms by passing the posterior bot probability from one tweet storm as the prior assumption for the following tweet storm. For this proof-of-concept work we use a small, unambiguous dataset of 777 verified humans and 223 known bot accounts. We find the ENB model with four classifiers in the ensemble—decision tree, support vector machine, multi-layer perceptron, and logistic regression—provides the best results with a classification accuracy of 98.5%, and an f-score of 0.96 on the withheld validation data.</description><subject>Algorithms</subject><subject>Applications of Graph Theory and Complex Networks</subject><subject>Automation</subject><subject>Bayesian analysis</subject><subject>Behavior</subject><subject>Classifiers</subject><subject>Computer Science</subject><subject>Content creation</subject><subject>Data Mining and Knowledge Discovery</subject><subject>Datasets</subject><subject>Decision trees</subject><subject>Economics</subject><subject>False information</subject><subject>Game Theory</subject><subject>Humanities</subject><subject>Law</subject><subject>Machine learning</subject><subject>Metadata</subject><subject>Methodology of the Social Sciences</subject><subject>Multilayer perceptrons</subject><subject>Multilayers</subject><subject>Original Article</subject><subject>Probability</subject><subject>Social and Behav. Sciences</subject><subject>Social media</subject><subject>Social networks</subject><subject>Statistical analysis</subject><subject>Statistical inference</subject><subject>Statistics for Social Sciences</subject><subject>Storms</subject><subject>Support vector machines</subject><subject>Time series</subject><subject>Time use</subject><subject>Wavelet transforms</subject><issn>1869-5450</issn><issn>1869-5469</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kE9LAzEQxYMoWGq_gKeA59XJJps_Ry1ahYIXPYfsbraktJuaSZH99m5d0ZunmWHeezx-hFwzuGUA6g4ZL5UuoGTFeGpeqDMyY1qaohLSnP_uFVySBeIWABhwbkDOyOrBDR6D62lofZ9DFxqXQ-xp7GgdM9Ijhn5Ds98fYnI76nq3GzDg6Z8_vc8Uc0x7vCIXnduhX_zMOXl_enxbPhfr19XL8n5dNJyZXEgvGNeqrUVjtCqlA90ao1qnW6Uk1LpqawWyFpIL43UjwQtRVVwIppl2LZ-Tmyn3kOLH0WO223hMYym0pWFGMs7LalSVk6pJETH5zh5S2Ls0WAb2xMxOzOzIzH4zs2o08cmEo7jf-PQX_Y_rC1OUbdc</recordid><startdate>20211201</startdate><enddate>20211201</enddate><creator>Kirn, Spencer Lee</creator><creator>Hinders, Mark K.</creator><general>Springer Vienna</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>0-V</scope><scope>3V.</scope><scope>7XB</scope><scope>88J</scope><scope>8BJ</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FQK</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JBE</scope><scope>JQ2</scope><scope>K7-</scope><scope>M2R</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope></search><sort><creationdate>20211201</creationdate><title>Bayesian identification of bots using temporal analysis of tweet storms</title><author>Kirn, Spencer Lee ; Hinders, Mark K.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-6e41387db4c98726a08d997da8d7760b85db706b46349e8c60e44553441818ad3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Applications of Graph Theory and Complex Networks</topic><topic>Automation</topic><topic>Bayesian analysis</topic><topic>Behavior</topic><topic>Classifiers</topic><topic>Computer Science</topic><topic>Content creation</topic><topic>Data Mining and Knowledge Discovery</topic><topic>Datasets</topic><topic>Decision trees</topic><topic>Economics</topic><topic>False information</topic><topic>Game Theory</topic><topic>Humanities</topic><topic>Law</topic><topic>Machine learning</topic><topic>Metadata</topic><topic>Methodology of the Social Sciences</topic><topic>Multilayer perceptrons</topic><topic>Multilayers</topic><topic>Original Article</topic><topic>Probability</topic><topic>Social and Behav. Sciences</topic><topic>Social media</topic><topic>Social networks</topic><topic>Statistical analysis</topic><topic>Statistical inference</topic><topic>Statistics for Social Sciences</topic><topic>Storms</topic><topic>Support vector machines</topic><topic>Time series</topic><topic>Time use</topic><topic>Wavelet transforms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kirn, Spencer Lee</creatorcontrib><creatorcontrib>Hinders, Mark K.</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Social Sciences Premium Collection</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Social Science Database (Alumni Edition)</collection><collection>International Bibliography of the Social Sciences (IBSS)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Social Science Premium Collection</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>International Bibliography of the Social Sciences</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>International Bibliography of the Social Sciences</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Social Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><jtitle>Social network analysis and mining</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kirn, Spencer Lee</au><au>Hinders, Mark K.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Bayesian identification of bots using temporal analysis of tweet storms</atitle><jtitle>Social network analysis and mining</jtitle><stitle>Soc. Netw. Anal. Min</stitle><date>2021-12-01</date><risdate>2021</risdate><volume>11</volume><issue>1</issue><spage>74</spage><pages>74-</pages><artnum>74</artnum><issn>1869-5450</issn><eissn>1869-5469</eissn><abstract>The key to identifying automated activity on social media is to isolate and analyze individual tweet storms that show how an account interacts with the twitterverse over time. In this work we propose the Dynamic Wavelet Fingerprint (DWFP) as a way to identify and flag this activity. Time-series representations of tweet storms are constructed using post metadata, and the DWFP converts these into binary images using a wavelet transform. To describe each tweet storm, features are extracted from the account metadata, tweet metadata, and DWFP images and then passed to a probabilistic classifier. We test three Bayesian Inference models: Multinomial Naïve Bayes, Gaussian Naïve Bayes, and Ensemble Naïve Bayes (ENB). Using Bayesian Inference structures allows us to propagate information between tweet storms by passing the posterior bot probability from one tweet storm as the prior assumption for the following tweet storm. For this proof-of-concept work we use a small, unambiguous dataset of 777 verified humans and 223 known bot accounts. We find the ENB model with four classifiers in the ensemble—decision tree, support vector machine, multi-layer perceptron, and logistic regression—provides the best results with a classification accuracy of 98.5%, and an f-score of 0.96 on the withheld validation data.</abstract><cop>Vienna</cop><pub>Springer Vienna</pub><doi>10.1007/s13278-021-00783-7</doi></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1869-5450 |
ispartof | Social network analysis and mining, 2021-12, Vol.11 (1), p.74, Article 74 |
issn | 1869-5450 1869-5469 |
language | eng |
recordid | cdi_proquest_journals_2919613325 |
source | ProQuest Central Essentials; ProQuest Central (Alumni Edition); ProQuest Central Student; ProQuest Central Korea; ProQuest Central UK/Ireland; SpringerLink Journals - AutoHoldings; ProQuest Central |
subjects | Algorithms Applications of Graph Theory and Complex Networks Automation Bayesian analysis Behavior Classifiers Computer Science Content creation Data Mining and Knowledge Discovery Datasets Decision trees Economics False information Game Theory Humanities Law Machine learning Metadata Methodology of the Social Sciences Multilayer perceptrons Multilayers Original Article Probability Social and Behav. Sciences Social media Social networks Statistical analysis Statistical inference Statistics for Social Sciences Storms Support vector machines Time series Time use Wavelet transforms |
title | Bayesian identification of bots using temporal analysis of tweet storms |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T01%3A24%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Bayesian%20identification%20of%20bots%20using%20temporal%20analysis%20of%20tweet%20storms&rft.jtitle=Social%20network%20analysis%20and%20mining&rft.au=Kirn,%20Spencer%20Lee&rft.date=2021-12-01&rft.volume=11&rft.issue=1&rft.spage=74&rft.pages=74-&rft.artnum=74&rft.issn=1869-5450&rft.eissn=1869-5469&rft_id=info:doi/10.1007/s13278-021-00783-7&rft_dat=%3Cproquest_cross%3E2919613325%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2919613325&rft_id=info:pmid/&rfr_iscdi=true |