Intrusion detection using Highest Wins feature selection algorithm

The rapid advancement of Internet stimulates building intelligent data mining systems for detecting intrusion attacks. The performance of such systems might be negatively affected due to the big datasets employed in the learning phase. Determining the appropriate group of features within training da...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neural computing & applications 2021-08, Vol.33 (16), p.9805-9816
Hauptverfasser: Mohammad, Rami Mustafa A., Alsmadi, Mutasem K.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The rapid advancement of Internet stimulates building intelligent data mining systems for detecting intrusion attacks. The performance of such systems might be negatively affected due to the big datasets employed in the learning phase. Determining the appropriate group of features within training datasets is an essential phase when building data mining classification models. Nevertheless, the resulted minimized set of features should maintain or even improve the performance of the classification models. Throughout this article, an innovative feature selection algorithm is proposed and is called “the Highest Wins” ( HW ). To evaluate the generalization ability of HW , it has been applied for creating classification models using naïve Bayes technique from 10 benchmark datasets. The obtained results were compared against two well-known strategies, namely chi-square and information gain. The experimental results confirmed the competitiveness ability of the suggested strategy in terms of various evaluation measurements such as recall, precision, and error rate while significantly decreasing the number of selected features in datasets. Further, the HW is used for building a naïve Bayes and decision tree intrusion detection classifiers using the well-known dataset from Network Security Laboratory-Knowledge Discovery in Databases (NSL-KDD). The results were promising not just in terms of overall performance, but also in terms of the time needed to build the classification model.
ISSN:0941-0643
1433-3058
DOI:10.1007/s00521-021-05745-w