A comprehensive comparison study of ML models for multistage APT detection: focus on data preprocessing and resampling

Advanced persistent threats (APTs) present a significant cybersecurity challenge, necessitating innovative detection methods. This study stands out by integrating advanced data preparation with strategies for handling data imbalances, tailored for the SCVIC-APT-2021 dataset. We employ a mix of resam...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of supercomputing 2024, Vol.80 (10), p.14143-14179
Hauptverfasser: Dau, Dinh-Dong, Lee, Soojin, Kim, Hanseok
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Advanced persistent threats (APTs) present a significant cybersecurity challenge, necessitating innovative detection methods. This study stands out by integrating advanced data preparation with strategies for handling data imbalances, tailored for the SCVIC-APT-2021 dataset. We employ a mix of resampling, cost-sensitive learning, and ensemble methods, alongside machine learning and deep learning models like XGBoost, LightGBM, and ANNs, to enhance APT detection. Our strategy, which draws from the MITRE ATT&CK framework, concentrates on each stage of APT attacks, which significantly increases detection accuracy. Notably, we achieved a Macro F1-score of 95.20% with XGBoost and 96.67% with LightGBM, and significant enhancements in the area under the precision–recall curve for both. Our study’s exploration of the SCVIC-APT-2021 dataset marks a progressive step in APT detection research, with vital implications for future cybersecurity developments.
ISSN:0920-8542
1573-0484
DOI:10.1007/s11227-024-06010-2