Robust Intrusion Detection for Industrial Control Systems Using Improved Autoencoder and Bayesian Gaussian Mixture Model

Machine learning-based intrusion detection systems are an effective way to cope with the increasing security threats faced by industrial control systems. Considering that it is hard and expensive to obtain attack data, it is more reasonable to develop a model trained with only normal data. However,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Mathematics (Basel) 2023-04, Vol.11 (9), p.2048
Hauptverfasser: Wang, Chao, Liu, Hongri, Li, Chao, Sun, Yunxiao, Wang, Wenting, Wang, Bailing
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Machine learning-based intrusion detection systems are an effective way to cope with the increasing security threats faced by industrial control systems. Considering that it is hard and expensive to obtain attack data, it is more reasonable to develop a model trained with only normal data. However, both high-dimensional data and the presence of outliers in the training set result in efficiency degradation. In this research, we present a hybrid intrusion detection method to overcome these two problems. First, we created an improved autoencoder that incorporates the deep support vector data description (Deep SVDD) loss into the training of the autoencoder. Under the combination of Deep SVDD loss and reconstruction loss, the novel autoencoder learns a more compact latent representation from high-dimensional data. The density-based spatial clustering of applications with noise algorithm is then used to remove potential outliers in the training data. Finally, a Bayesian Gaussian mixture model is used to identify anomalies. It learns the distribution of the filtered training data and uses the probabilities to classify normal and anomalous samples. We conducted a series of experiments on two intrusion detection datasets to assess performance. The proposed model performs better than other baseline methods when dealing with high-dimensional and contaminated data.
ISSN:2227-7390
2227-7390
DOI:10.3390/math11092048