Learning textual features for Twitter spam detection: A systematic literature review

Background—Nowadays, with the rise of Internet access and mobile devices around the globe, more people are using social networks for collaboration and receiving real-time information. Twitter, the microblogging site that is becoming a critical source of communication, has also grabbed the attention...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2023-10, Vol.228, p.120366, Article 120366
Hauptverfasser: Bazzaz Abkenar, Sepideh, Haghi Kashani, Mostafa, Akbari, Mohammad, Mahdipour, Ebrahim
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Background—Nowadays, with the rise of Internet access and mobile devices around the globe, more people are using social networks for collaboration and receiving real-time information. Twitter, the microblogging site that is becoming a critical source of communication, has also grabbed the attention of spammers to distract users. So far, researchers have introduced various defense techniques to detect spams and combat spammers’ activities. To overcome this problem, many novel techniques have been offered by researchers, which have greatly enhanced spam detection performance. Objective—The purpose of this paper is to identify, taxonomically classify, and compare current Twitter spam detection approaches in a systematic way. Method—This study presents a comprehensive Systematic Literature Review (SLR) method for spam detection on Twitter regarding 70 most relevant papers published between 2010 and October 2022. Literature review analysis reveals that most of the existing Twitter spam detection techniques are based on textual content and messages (tweets) that rely on Machine Learning (ML)-based algorithms. The major differences in these ML algorithms which use various classification and clustering algorithms are related to various feature selection methods. Hence, we propose a classification based on different feature selection analyses, namely content analysis, user analysis, tweet analysis, network analysis, and hybrid analysis. Results—Various parameters are identified to investigate the Twitter spam detection approaches, and each of the papers was examined to find the research methodology and present comparative studies on current approaches. Conclusion—This paper demonstrates that the existing Twitter spam detection approaches have encountered several open issues, including scalability, streaming data analysis, and processing. The most obvious unresolved issues are spam drift and non-English tweets.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2023.120366