X Bot Detection Using One-Class Classification Methods with Isolation Forest Algorithm

X bots pose a significant issue in the social media landscape, with many shared links originating from bot-like accounts. This study introduces the application of the Isolation Forest algorithm, aimed explicitly at identifying anomalies such as bots by analyzing X account details. This study utilize...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal on advanced science, engineering and information technology engineering and information technology, 2024-08, Vol.14 (4), p.1233-1239
Hauptverfasser: Miftahuddin, Yusup, Al-Ghifary, Muhammad Haydar
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:X bots pose a significant issue in the social media landscape, with many shared links originating from bot-like accounts. This study introduces the application of the Isolation Forest algorithm, aimed explicitly at identifying anomalies such as bots by analyzing X account details. This study utilizes a dataset that merges data from Botometer with supplementary metrics like ‘average tweets per day’ and ‘account age in days’, contributed by David Martín Gutiérrez. This approach was adopted due to the increasing difficulties accessing the X API. The dataset comprises 37,438 instances, with 25,013 labeled human accounts and 12,425 labeled bot accounts. Pre-processing is performed to remove irrelevant features, and the dataset is split into Training, Validation, and Test sets in a 70:15:15 ratio. The training set undergoes hyperparameter and threshold tuning to identify the best configuration for this specific dataset (n_estimators: 50, contamination: 0.5, bootstrap: True), achieving a training set F1-score of 0.211001. Despite these optimization efforts, the Isolation Forest model's performance remains relatively low. The Test set evaluation yields modest precision, recall, and F1-score values (0.1801, 0.2795, and 0.2190, respectively), with a ROC AUC score of 0.3272. While the Isolation Forest algorithm shows promise in detecting X bots, its performance on this specific dataset is limited. Isolation Forest may not be the most suitable algorithm for this particular bot detection task on this dataset. Future work will explore techniques to enhance the performance of bot detection for a more comprehensive analysis.
ISSN:2088-5334
2088-5334
DOI:10.18517/ijaseit.14.4.19364