Detection method of domain names generated by DGAs based on semantic representation and deep neural network

Botnets have become one of the main threats to cyberspace security currently. More and more bots utilize the domain generation algorithm (DGA) to generate malicious domain names to communicate with Command & Control (C&C) servers. A well-designed DGA can bypass the traditional detection meth...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computers & security 2019-08, Vol.85, p.77-88
Hauptverfasser: Xu, Congyuan, Shen, Jizhong, Du, Xin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Botnets have become one of the main threats to cyberspace security currently. More and more bots utilize the domain generation algorithm (DGA) to generate malicious domain names to communicate with Command & Control (C&C) servers. A well-designed DGA can bypass the traditional detection methods such as sinkhole and rule filtering, which raises new challenges to cyberspace security. In the field of machine learning, the n-gram is a semantic model that characterizes the relationship among neighboring morphemes while deep convolutional neural networks have a robust capability in processing information with translation-invariant properties. In this paper, we combined n-gram and a deep convolutional neural network and then proposed a novel n-gram combined character based domain classification (n-CBDC) model. The n-CBDC model runs in an end-to-end way that doesn’t require hand-extracted features or domain name system (DNS) contextual information; it only needs to input the domain name itself and can automatically estimate the probability that the domain name was generated by DGAs. Experiments on real-world data show that the proposed method can effectively detect domain names generated by DGAs with 98.69% average detection rate and 0.9829 average F-measure, and significantly outperformed the state-of-art methods in detecting pronounceable and wordlist-based DGA domain names with more than 93.89% detection rate. Therefore, the proposed detection method is robust and has a wide range of adaptability in detecting various types of domain names generated by DGAs.
ISSN:0167-4048
1872-6208
DOI:10.1016/j.cose.2019.04.015