Clues in Tweets: Twitter-Guided Discovery and Analysis of SMS Spam

With its critical role in business and service delivery through mobile devices, SMS (Short Message Service) has long been abused for spamming, which is still on the rise today possibly due to the emergence of A2P bulk messaging. The effort to control SMS spam has been hampered by the lack of up-to-d...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2022-04
Hauptverfasser: Tang, Siyuan, Xianghang Mi, Li, Ying, Wang, XiaoFeng, Chen, Kai
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Tang, Siyuan
Xianghang Mi
Li, Ying
Wang, XiaoFeng
Chen, Kai
description With its critical role in business and service delivery through mobile devices, SMS (Short Message Service) has long been abused for spamming, which is still on the rise today possibly due to the emergence of A2P bulk messaging. The effort to control SMS spam has been hampered by the lack of up-to-date information about illicit activities. In our research, we proposed a novel solution to collect recent SMS spam data, at a large scale, from Twitter, where users voluntarily report the spam messages they receive. For this purpose, we designed and implemented SpamHunter, an automated pipeline to discover SMS spam reporting tweets and extract message content from the attached screenshots. Leveraging SpamHunter, we collected from Twitter a dataset of 21,918 SMS spam messages in 75 languages, spanning over four years. To our best knowledge, this is the largest SMS spam dataset ever made public. More importantly, SpamHunter enables us to continuously monitor emerging SMS spam messages, which facilitates the ongoing effort to mitigate SMS spamming. We also performed an in-depth measurement study that sheds light on the new trends in the spammer's strategies, infrastructure and spam campaigns. We also utilized our spam SMS data to evaluate the robustness of the spam countermeasures put in place by the SMS ecosystem, including anti-spam services, bulk SMS services, and text messaging apps. Our evaluation shows that such protection cannot effectively handle those spam samples: either introducing significant false positives or missing a large number of newly reported spam messages.
doi_str_mv 10.48550/arxiv.2204.01233
format Article
fullrecord <record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2204_01233</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2647056976</sourcerecordid><originalsourceid>FETCH-LOGICAL-a953-2decd504611feec01f51cb9fd376feb90c98ce749a5128972dae376e9f8927d73</originalsourceid><addsrcrecordid>eNotjz1PwzAYhC0kJKrSH8CEJeYEf8RxzFYCLUhFDMkeufFryVWaBDsp5N8TWpa74U6nexC6oyROMiHIo_Y_7hQzRpKYUMb5FVrMSqMsYewGrUI4EEJYKpkQfIGe82aEgF2Ly2-AITzN7oYBfLQdnQGDX1youxP4CevW4HWrmym4gDuLi48CF70-3qJrq5sAq39fonLzWuZv0e5z-56vd5FWgkfMQG0ESVJKLUBNqBW03itruEwt7BWpVVaDTJQWlGVKMqNhjkDZTDFpJF-i-8vsGbDqvTtqP1V_oNUZdG48XBq9775mqqE6dKOfH4eKpYkkIlUy5b8ORFTC</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2647056976</pqid></control><display><type>article</type><title>Clues in Tweets: Twitter-Guided Discovery and Analysis of SMS Spam</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Tang, Siyuan ; Xianghang Mi ; Li, Ying ; Wang, XiaoFeng ; Chen, Kai</creator><creatorcontrib>Tang, Siyuan ; Xianghang Mi ; Li, Ying ; Wang, XiaoFeng ; Chen, Kai</creatorcontrib><description>With its critical role in business and service delivery through mobile devices, SMS (Short Message Service) has long been abused for spamming, which is still on the rise today possibly due to the emergence of A2P bulk messaging. The effort to control SMS spam has been hampered by the lack of up-to-date information about illicit activities. In our research, we proposed a novel solution to collect recent SMS spam data, at a large scale, from Twitter, where users voluntarily report the spam messages they receive. For this purpose, we designed and implemented SpamHunter, an automated pipeline to discover SMS spam reporting tweets and extract message content from the attached screenshots. Leveraging SpamHunter, we collected from Twitter a dataset of 21,918 SMS spam messages in 75 languages, spanning over four years. To our best knowledge, this is the largest SMS spam dataset ever made public. More importantly, SpamHunter enables us to continuously monitor emerging SMS spam messages, which facilitates the ongoing effort to mitigate SMS spamming. We also performed an in-depth measurement study that sheds light on the new trends in the spammer's strategies, infrastructure and spam campaigns. We also utilized our spam SMS data to evaluate the robustness of the spam countermeasures put in place by the SMS ecosystem, including anti-spam services, bulk SMS services, and text messaging apps. Our evaluation shows that such protection cannot effectively handle those spam samples: either introducing significant false positives or missing a large number of newly reported spam messages.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2204.01233</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Computer Science - Cryptography and Security ; Datasets ; Depth measurement ; Electronic devices ; Evaluation ; Messages ; Service introduction ; Short message service ; Spamming ; Text messaging</subject><ispartof>arXiv.org, 2022-04</ispartof><rights>2022. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,784,885,27925</link.rule.ids><backlink>$$Uhttps://doi.org/10.48550/arXiv.2204.01233$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.1145/3548606.3559351$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink></links><search><creatorcontrib>Tang, Siyuan</creatorcontrib><creatorcontrib>Xianghang Mi</creatorcontrib><creatorcontrib>Li, Ying</creatorcontrib><creatorcontrib>Wang, XiaoFeng</creatorcontrib><creatorcontrib>Chen, Kai</creatorcontrib><title>Clues in Tweets: Twitter-Guided Discovery and Analysis of SMS Spam</title><title>arXiv.org</title><description>With its critical role in business and service delivery through mobile devices, SMS (Short Message Service) has long been abused for spamming, which is still on the rise today possibly due to the emergence of A2P bulk messaging. The effort to control SMS spam has been hampered by the lack of up-to-date information about illicit activities. In our research, we proposed a novel solution to collect recent SMS spam data, at a large scale, from Twitter, where users voluntarily report the spam messages they receive. For this purpose, we designed and implemented SpamHunter, an automated pipeline to discover SMS spam reporting tweets and extract message content from the attached screenshots. Leveraging SpamHunter, we collected from Twitter a dataset of 21,918 SMS spam messages in 75 languages, spanning over four years. To our best knowledge, this is the largest SMS spam dataset ever made public. More importantly, SpamHunter enables us to continuously monitor emerging SMS spam messages, which facilitates the ongoing effort to mitigate SMS spamming. We also performed an in-depth measurement study that sheds light on the new trends in the spammer's strategies, infrastructure and spam campaigns. We also utilized our spam SMS data to evaluate the robustness of the spam countermeasures put in place by the SMS ecosystem, including anti-spam services, bulk SMS services, and text messaging apps. Our evaluation shows that such protection cannot effectively handle those spam samples: either introducing significant false positives or missing a large number of newly reported spam messages.</description><subject>Computer Science - Cryptography and Security</subject><subject>Datasets</subject><subject>Depth measurement</subject><subject>Electronic devices</subject><subject>Evaluation</subject><subject>Messages</subject><subject>Service introduction</subject><subject>Short message service</subject><subject>Spamming</subject><subject>Text messaging</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNotjz1PwzAYhC0kJKrSH8CEJeYEf8RxzFYCLUhFDMkeufFryVWaBDsp5N8TWpa74U6nexC6oyROMiHIo_Y_7hQzRpKYUMb5FVrMSqMsYewGrUI4EEJYKpkQfIGe82aEgF2Ly2-AITzN7oYBfLQdnQGDX1youxP4CevW4HWrmym4gDuLi48CF70-3qJrq5sAq39fonLzWuZv0e5z-56vd5FWgkfMQG0ESVJKLUBNqBW03itruEwt7BWpVVaDTJQWlGVKMqNhjkDZTDFpJF-i-8vsGbDqvTtqP1V_oNUZdG48XBq9775mqqE6dKOfH4eKpYkkIlUy5b8ORFTC</recordid><startdate>20220404</startdate><enddate>20220404</enddate><creator>Tang, Siyuan</creator><creator>Xianghang Mi</creator><creator>Li, Ying</creator><creator>Wang, XiaoFeng</creator><creator>Chen, Kai</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20220404</creationdate><title>Clues in Tweets: Twitter-Guided Discovery and Analysis of SMS Spam</title><author>Tang, Siyuan ; Xianghang Mi ; Li, Ying ; Wang, XiaoFeng ; Chen, Kai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a953-2decd504611feec01f51cb9fd376feb90c98ce749a5128972dae376e9f8927d73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Cryptography and Security</topic><topic>Datasets</topic><topic>Depth measurement</topic><topic>Electronic devices</topic><topic>Evaluation</topic><topic>Messages</topic><topic>Service introduction</topic><topic>Short message service</topic><topic>Spamming</topic><topic>Text messaging</topic><toplevel>online_resources</toplevel><creatorcontrib>Tang, Siyuan</creatorcontrib><creatorcontrib>Xianghang Mi</creatorcontrib><creatorcontrib>Li, Ying</creatorcontrib><creatorcontrib>Wang, XiaoFeng</creatorcontrib><creatorcontrib>Chen, Kai</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tang, Siyuan</au><au>Xianghang Mi</au><au>Li, Ying</au><au>Wang, XiaoFeng</au><au>Chen, Kai</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Clues in Tweets: Twitter-Guided Discovery and Analysis of SMS Spam</atitle><jtitle>arXiv.org</jtitle><date>2022-04-04</date><risdate>2022</risdate><eissn>2331-8422</eissn><abstract>With its critical role in business and service delivery through mobile devices, SMS (Short Message Service) has long been abused for spamming, which is still on the rise today possibly due to the emergence of A2P bulk messaging. The effort to control SMS spam has been hampered by the lack of up-to-date information about illicit activities. In our research, we proposed a novel solution to collect recent SMS spam data, at a large scale, from Twitter, where users voluntarily report the spam messages they receive. For this purpose, we designed and implemented SpamHunter, an automated pipeline to discover SMS spam reporting tweets and extract message content from the attached screenshots. Leveraging SpamHunter, we collected from Twitter a dataset of 21,918 SMS spam messages in 75 languages, spanning over four years. To our best knowledge, this is the largest SMS spam dataset ever made public. More importantly, SpamHunter enables us to continuously monitor emerging SMS spam messages, which facilitates the ongoing effort to mitigate SMS spamming. We also performed an in-depth measurement study that sheds light on the new trends in the spammer's strategies, infrastructure and spam campaigns. We also utilized our spam SMS data to evaluate the robustness of the spam countermeasures put in place by the SMS ecosystem, including anti-spam services, bulk SMS services, and text messaging apps. Our evaluation shows that such protection cannot effectively handle those spam samples: either introducing significant false positives or missing a large number of newly reported spam messages.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2204.01233</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2022-04
issn 2331-8422
language eng
recordid cdi_arxiv_primary_2204_01233
source arXiv.org; Free E- Journals
subjects Computer Science - Cryptography and Security
Datasets
Depth measurement
Electronic devices
Evaluation
Messages
Service introduction
Short message service
Spamming
Text messaging
title Clues in Tweets: Twitter-Guided Discovery and Analysis of SMS Spam
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T00%3A30%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Clues%20in%20Tweets:%20Twitter-Guided%20Discovery%20and%20Analysis%20of%20SMS%20Spam&rft.jtitle=arXiv.org&rft.au=Tang,%20Siyuan&rft.date=2022-04-04&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2204.01233&rft_dat=%3Cproquest_arxiv%3E2647056976%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2647056976&rft_id=info:pmid/&rfr_iscdi=true