Network Traffic Classification Based on SD Sampling and Hierarchical Ensemble Learning

With the increase in cyber threats in recent years, there have been more forms of demand for network security protection measures. Network traffic classification technology is used to adapt to the dynamic threat environment. However, network traffic has a natural unbalanced class distribution proble...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Security and communication networks 2023, Vol.2023, p.1-16
Hauptverfasser: Qin, Jian, Han, Xueying, Wang, Chonghua, Hu, Qing, Jiang, Bo, Zhang, Chen, Lu, Zhigang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:With the increase in cyber threats in recent years, there have been more forms of demand for network security protection measures. Network traffic classification technology is used to adapt to the dynamic threat environment. However, network traffic has a natural unbalanced class distribution problem, and the single model leads to the low accuracy and high false-positive rate of the traditional detection model. Given the above two problems, this paper proposes a new dataset balancing method named SD sampling based on the SMOTE algorithm. Different from the SMOTE algorithm, this method divides the sample into two types that are easy and difficult to classify and only balances the difficult-to-classify sample, which not only overcomes the SMOTE’s overgeneralization but also combines the idea of oversampling and undersampling. In addition, a two-layer structure combined with XGBoost and the random forest is proposed for multiclassification of anomalous traffic, since using a hierarchical structure can better classify minority abnormal traffic. This paper conducts experiments on the CICIDS2017 dataset. The results show that the classification accuracy of the proposed model is more than 99.70% and that the false-positive rate is less than 0.34%, indicating that the proposed model is better than traditional models.
ISSN:1939-0114
1939-0122
DOI:10.1155/2023/4374385