GMM with parameters initialization based on SVD for network threat detection

In the field of security, the data labels are unknown or the labels are too expensive to label, so that clustering methods are used to detect the threat behavior contained in the big data. The most widely used probabilistic clustering model is Gaussian Mixture Models(GMM), which is flexible and powe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of intelligent & fuzzy systems 2021-01, Vol.40 (1), p.477-490
Hauptverfasser: Xu, Yanping, Ye, Tingcong, Wang, Xin, Lai, Yuping, Qiu, Jian, Zhang, Lingjun, Zhang, Xia
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In the field of security, the data labels are unknown or the labels are too expensive to label, so that clustering methods are used to detect the threat behavior contained in the big data. The most widely used probabilistic clustering model is Gaussian Mixture Models(GMM), which is flexible and powerful to apply prior knowledge for modelling the uncertainty of the data. Therefore, in this paper, we use GMM to build the threat behavior detection model. Commonly, Expectation Maximization (EM) and Variational Inference (VI) are used to estimate the optimal parameters of GMM. However, both EM and VI are quite sensitive to the initial values of the parameters. Therefore, we propose to use Singular Value Decomposition (SVD) to initialize the parameters. Firstly, SVD is used to factorize the data set matrix to get the singular value matrix and singular matrices. Then we calculate the number of the components of GMM by the first two singular values in the singular value matrix and the dimension of the data. Next, other parameters of GMM, such as the mixing coefficients, the mean and the covariance, are calculated based on the number of the components. After that, the initialization values of the parameters are input into EM and VI to estimate the optimal parameters of GMM. The experiment results indicate that our proposed method performs well on the parameters initialization of GMM clustering using EM and VI for estimating parameters.
ISSN:1064-1246
1875-8967
DOI:10.3233/JIFS-200066