Enhancing network intrusion detection: a dual-ensemble approach with CTGAN-balanced data and weak classifiers

With the expansion of the Internet, Internet of Things devices, and related services, effective intrusion detection systems are vital in cybersecurity. This study presents a significant advancement in cybersecurity by leveraging ensemble learning techniques alongside generative adversarial networks,...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of supercomputing 2024, Vol.80 (11), p.16301-16333
Hauptverfasser:	Soflaei, Mohammad Reza Abbaszadeh Bavil, Salehpour, Arash, Samadzamini, Karim
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Classification Classifiers Compilers Computer Science Cybersecurity Datasets Decision trees Ensemble learning Generative adversarial networks Internet of Things Interpreters Intrusion detection systems Machine learning Processor Architectures Programming Languages System effectiveness
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	With the expansion of the Internet, Internet of Things devices, and related services, effective intrusion detection systems are vital in cybersecurity. This study presents a significant advancement in cybersecurity by leveraging ensemble learning techniques alongside generative adversarial networks, proposing a novel framework for network behavior classification using the UNSW-NB15 dataset. Similar to any other real-world dataset, the UNSW-NB15 dataset poses inherent challenges of data imbalance, with significantly fewer instances of intrusion compared to normal network behavior. Our main contribution to the existing literature is the introduction of a conditional tabular generative adversarial network (CTGAN), aimed at addressing the existing issue of data imbalance in the dataset. In previous approaches, this issue was often overlooked; however, the proposed framework achieves a substantial improvement in model performance by balancing this dataset. Through training three shallow binary classification algorithms (decision trees, logistic regression, and Gaussian naive Bayes) on both the CTGAN-balanced data and the original imbalanced dataset, we uncover remarkable improvements in identifying network intrusion. Our study employs a novel two-stage label-wise ensembling process, notably resulting in a final XGBoost meta-classifier. The ultimate achievement of our framework demonstrates 98% accuracy for binary classification and 95% for multi-class classification, outperforming existing state-of-the-art models. By offering a robust framework for effective intrusion detection, this work marks a substantial step forward in addressing data imbalance challenges within the UNSW-NB15 dataset.
ISSN:	0920-8542 1573-0484
DOI:	10.1007/s11227-024-06108-7