Towards Data-Driven Network Intrusion Detection Systems: Features Dimensionality Reduction and Machine Learning

Cyberattacks have increased in tandem with the exponential expansion of computer networks and network applications throughout the world. In this study, we evaluate and compare four features selection methods, seven classical machine learning algorithms, and the deep learning algorithm on one million...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of interactive mobile technologies 2022-07, Vol.16 (14), p.123-135
Hauptverfasser: Maabreh, Majdi, Obeidat, Ibrahim, Abu Elsoud, Esraa, Alnajjar, Asma, Alzyoud, Rahaf, Darwish, Omar
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Cyberattacks have increased in tandem with the exponential expansion of computer networks and network applications throughout the world. In this study, we evaluate and compare four features selection methods, seven classical machine learning algorithms, and the deep learning algorithm on one million random instances of CSE-CIC-IDS2018 big data set for network intrusions. The dataset was preprocessed and cleaned and all learning algorithms were trained on the original values of features. The feature selection methods highlighted the importance of features related to forwarding direction (FWD) and two flow measures (FLOW) in predicting the binary traffic type; benign or attack. Furthermore, the results revealed that whether models are trained on all features or the top 30 features selected by any of the four features selection techniques used in this experiment, there is no significant difference in model performance. Moreover, we may be able to train ML models on only four features and have them perform similarly to models trained on all data,which may result in preferable models in terms of complexity, explainability, and scale for deployment. Furthermore, by choosing four unanimity features instead of all traffic features, training time may be reduced from 10% to 50% of the training time on all features.
ISSN:1865-7923
1865-7923
DOI:10.3991/ijim.v16i14.30197