SATAR: A Self-supervised Approach to Twitter Account Representation Learning and its Application in Bot Detection

Twitter has become a major social media platform since its launching in 2006, while complaints about bot accounts have increased recently. Although extensive research efforts have been made, the state-of-the-art bot detection methods fall short of generalizability and adaptability. Specifically, pre...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2021-08
Hauptverfasser:	Feng, Shangbin, Wan, Herun, Wang, Ningnan, Li, Jundong, Luo, Minnan
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Social and Information Networks Datasets Digital media Learning Representations Semantics Social networks Software agents
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Feng, Shangbin Wan, Herun Wang, Ningnan Li, Jundong Luo, Minnan
description	Twitter has become a major social media platform since its launching in 2006, while complaints about bot accounts have increased recently. Although extensive research efforts have been made, the state-of-the-art bot detection methods fall short of generalizability and adaptability. Specifically, previous bot detectors leverage only a small fraction of user information and are often trained on datasets that only cover few types of bots. As a result, they fail to generalize to real-world scenarios on the Twittersphere where different types of bots co-exist. Additionally, bots in Twitter are constantly evolving to evade detection. Previous efforts, although effective once in their context, fail to adapt to new generations of Twitter bots. To address the two challenges of Twitter bot detection, we propose SATAR, a self-supervised representation learning framework of Twitter users, and apply it to the task of bot detection. In particular, SATAR generalizes by jointly leveraging the semantics, property and neighborhood information of a specific user. Meanwhile, SATAR adapts by pre-training on a massive number of self-supervised users and fine-tuning on detailed bot detection scenarios. Extensive experiments demonstrate that SATAR outperforms competitive baselines on different bot detection datasets of varying information completeness and collection time. SATAR is also proved to generalize in real-world scenarios and adapt to evolving generations of social media bots.
doi_str_mv	10.48550/arxiv.2106.13089
format	Article
fullrecord	<record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2106_13089</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2544993936</sourcerecordid><originalsourceid>FETCH-LOGICAL-a526-1ff115c6ccfc70e38825e2581c2e60b8ffb99b231e6f689c29fe8cb45f1928aa3</originalsourceid><addsrcrecordid>eNotkFtLAzEQhYMgWGp_gE8GfN6ayyZNfFvrFQpCu-9LNp1oSs1uk2zVf29rfRo4c84w50PoipJpqYQgtyZ--_2UUSKnlBOlz9CIcU4LVTJ2gSYpbQghTM6YEHyEdquqrpZ3uMIr2LoiDT3EvU-wxlXfx87YD5w7XH_5nCHiytpuCBkvoY-QIGSTfRfwAkwMPrxjE9bY53TMbr09LX3A913GD5DBHoVLdO7MNsHkf45R_fRYz1-Kxdvz67xaFEYwWVDnKBVWWuvsjABXiglgQlHLQJJWOddq3TJOQTqptGXagbJtKRzVTBnDx-j6dPaPR9NH_2niT3Pk0vxxOThuTo5Dz90AKTebbojh8FPDRFlqzTWX_BcwO2Z4</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2544993936</pqid></control><display><type>article</type><title>SATAR: A Self-supervised Approach to Twitter Account Representation Learning and its Application in Bot Detection</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Feng, Shangbin ; Wan, Herun ; Wang, Ningnan ; Li, Jundong ; Luo, Minnan</creator><creatorcontrib>Feng, Shangbin ; Wan, Herun ; Wang, Ningnan ; Li, Jundong ; Luo, Minnan</creatorcontrib><description>Twitter has become a major social media platform since its launching in 2006, while complaints about bot accounts have increased recently. Although extensive research efforts have been made, the state-of-the-art bot detection methods fall short of generalizability and adaptability. Specifically, previous bot detectors leverage only a small fraction of user information and are often trained on datasets that only cover few types of bots. As a result, they fail to generalize to real-world scenarios on the Twittersphere where different types of bots co-exist. Additionally, bots in Twitter are constantly evolving to evade detection. Previous efforts, although effective once in their context, fail to adapt to new generations of Twitter bots. To address the two challenges of Twitter bot detection, we propose SATAR, a self-supervised representation learning framework of Twitter users, and apply it to the task of bot detection. In particular, SATAR generalizes by jointly leveraging the semantics, property and neighborhood information of a specific user. Meanwhile, SATAR adapts by pre-training on a massive number of self-supervised users and fine-tuning on detailed bot detection scenarios. Extensive experiments demonstrate that SATAR outperforms competitive baselines on different bot detection datasets of varying information completeness and collection time. SATAR is also proved to generalize in real-world scenarios and adapt to evolving generations of social media bots.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2106.13089</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Computer Science - Social and Information Networks ; Datasets ; Digital media ; Learning ; Representations ; Semantics ; Social networks ; Software agents</subject><ispartof>arXiv.org, 2021-08</ispartof><rights>2021. This work is published under http://creativecommons.org/licenses/by-sa/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://creativecommons.org/licenses/by-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,780,881,27904</link.rule.ids><backlink>$$Uhttps://doi.org/10.48550/arXiv.2106.13089$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.1145/3459637.3481949$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink></links><search><creatorcontrib>Feng, Shangbin</creatorcontrib><creatorcontrib>Wan, Herun</creatorcontrib><creatorcontrib>Wang, Ningnan</creatorcontrib><creatorcontrib>Li, Jundong</creatorcontrib><creatorcontrib>Luo, Minnan</creatorcontrib><title>SATAR: A Self-supervised Approach to Twitter Account Representation Learning and its Application in Bot Detection</title><title>arXiv.org</title><description>Twitter has become a major social media platform since its launching in 2006, while complaints about bot accounts have increased recently. Although extensive research efforts have been made, the state-of-the-art bot detection methods fall short of generalizability and adaptability. Specifically, previous bot detectors leverage only a small fraction of user information and are often trained on datasets that only cover few types of bots. As a result, they fail to generalize to real-world scenarios on the Twittersphere where different types of bots co-exist. Additionally, bots in Twitter are constantly evolving to evade detection. Previous efforts, although effective once in their context, fail to adapt to new generations of Twitter bots. To address the two challenges of Twitter bot detection, we propose SATAR, a self-supervised representation learning framework of Twitter users, and apply it to the task of bot detection. In particular, SATAR generalizes by jointly leveraging the semantics, property and neighborhood information of a specific user. Meanwhile, SATAR adapts by pre-training on a massive number of self-supervised users and fine-tuning on detailed bot detection scenarios. Extensive experiments demonstrate that SATAR outperforms competitive baselines on different bot detection datasets of varying information completeness and collection time. SATAR is also proved to generalize in real-world scenarios and adapt to evolving generations of social media bots.</description><subject>Computer Science - Social and Information Networks</subject><subject>Datasets</subject><subject>Digital media</subject><subject>Learning</subject><subject>Representations</subject><subject>Semantics</subject><subject>Social networks</subject><subject>Software agents</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNotkFtLAzEQhYMgWGp_gE8GfN6ayyZNfFvrFQpCu-9LNp1oSs1uk2zVf29rfRo4c84w50PoipJpqYQgtyZ--_2UUSKnlBOlz9CIcU4LVTJ2gSYpbQghTM6YEHyEdquqrpZ3uMIr2LoiDT3EvU-wxlXfx87YD5w7XH_5nCHiytpuCBkvoY-QIGSTfRfwAkwMPrxjE9bY53TMbr09LX3A913GD5DBHoVLdO7MNsHkf45R_fRYz1-Kxdvz67xaFEYwWVDnKBVWWuvsjABXiglgQlHLQJJWOddq3TJOQTqptGXagbJtKRzVTBnDx-j6dPaPR9NH_2niT3Pk0vxxOThuTo5Dz90AKTebbojh8FPDRFlqzTWX_BcwO2Z4</recordid><startdate>20210827</startdate><enddate>20210827</enddate><creator>Feng, Shangbin</creator><creator>Wan, Herun</creator><creator>Wang, Ningnan</creator><creator>Li, Jundong</creator><creator>Luo, Minnan</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20210827</creationdate><title>SATAR: A Self-supervised Approach to Twitter Account Representation Learning and its Application in Bot Detection</title><author>Feng, Shangbin ; Wan, Herun ; Wang, Ningnan ; Li, Jundong ; Luo, Minnan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a526-1ff115c6ccfc70e38825e2581c2e60b8ffb99b231e6f689c29fe8cb45f1928aa3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Social and Information Networks</topic><topic>Datasets</topic><topic>Digital media</topic><topic>Learning</topic><topic>Representations</topic><topic>Semantics</topic><topic>Social networks</topic><topic>Software agents</topic><toplevel>online_resources</toplevel><creatorcontrib>Feng, Shangbin</creatorcontrib><creatorcontrib>Wan, Herun</creatorcontrib><creatorcontrib>Wang, Ningnan</creatorcontrib><creatorcontrib>Li, Jundong</creatorcontrib><creatorcontrib>Luo, Minnan</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Feng, Shangbin</au><au>Wan, Herun</au><au>Wang, Ningnan</au><au>Li, Jundong</au><au>Luo, Minnan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SATAR: A Self-supervised Approach to Twitter Account Representation Learning and its Application in Bot Detection</atitle><jtitle>arXiv.org</jtitle><date>2021-08-27</date><risdate>2021</risdate><eissn>2331-8422</eissn><abstract>Twitter has become a major social media platform since its launching in 2006, while complaints about bot accounts have increased recently. Although extensive research efforts have been made, the state-of-the-art bot detection methods fall short of generalizability and adaptability. Specifically, previous bot detectors leverage only a small fraction of user information and are often trained on datasets that only cover few types of bots. As a result, they fail to generalize to real-world scenarios on the Twittersphere where different types of bots co-exist. Additionally, bots in Twitter are constantly evolving to evade detection. Previous efforts, although effective once in their context, fail to adapt to new generations of Twitter bots. To address the two challenges of Twitter bot detection, we propose SATAR, a self-supervised representation learning framework of Twitter users, and apply it to the task of bot detection. In particular, SATAR generalizes by jointly leveraging the semantics, property and neighborhood information of a specific user. Meanwhile, SATAR adapts by pre-training on a massive number of self-supervised users and fine-tuning on detailed bot detection scenarios. Extensive experiments demonstrate that SATAR outperforms competitive baselines on different bot detection datasets of varying information completeness and collection time. SATAR is also proved to generalize in real-world scenarios and adapt to evolving generations of social media bots.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2106.13089</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2021-08
issn	2331-8422
language	eng
recordid	cdi_arxiv_primary_2106_13089
source	arXiv.org; Free E- Journals
subjects	Computer Science - Social and Information Networks Datasets Digital media Learning Representations Semantics Social networks Software agents
title	SATAR: A Self-supervised Approach to Twitter Account Representation Learning and its Application in Bot Detection
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T13%3A49%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SATAR:%20A%20Self-supervised%20Approach%20to%20Twitter%20Account%20Representation%20Learning%20and%20its%20Application%20in%20Bot%20Detection&rft.jtitle=arXiv.org&rft.au=Feng,%20Shangbin&rft.date=2021-08-27&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2106.13089&rft_dat=%3Cproquest_arxiv%3E2544993936%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2544993936&rft_id=info:pmid/&rfr_iscdi=true