Using XGBoost to Discover Infected Hosts Based on HTTP Traffic

In recent years, the number of malware and infected hosts has increased exponentially, which causes great losses to governments, enterprises, and individuals. However, traditional technologies are difficult to timely detect malware that has been deformed, confused, or modified since they usually det...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Security and communication networks 2019, Vol.2019 (2019), p.1-11
Hauptverfasser:	Jiang, Tianyu, Hu, Teng, Zhang, Xiaosong, Li, Ting, Niu, Weina, Wu, Heng
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Behavior Clustering Command and control Communication Headers Internet service providers Machine learning Malware Protocol (computers) Servers Traffic speed
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In recent years, the number of malware and infected hosts has increased exponentially, which causes great losses to governments, enterprises, and individuals. However, traditional technologies are difficult to timely detect malware that has been deformed, confused, or modified since they usually detect hosts before being infected by malware. Host detection during malware infection can make up for their deficiency. Moreover, the infected host usually sends a connection request to the command and control (C&C) server using the HTTP protocol, which generates malicious external traffic. Thus, if the host is found to have malicious external traffic, the host may be a host infected by malware. Based on the background, this paper uses HTTP traffic combined with eXtreme Gradient Boosting (XGBoost) algorithm to detect infected hosts in order to improve detection efficiency and accuracy. The proposed approach uses a template automatic generation algorithm to generate feature templates for HTTP headers and uses XGBoost algorithm to distinguish between malicious traffic and normal traffic. We conduct a performance analysis to demonstrate that our approach is efficient using dataset, which includes malware traffic from MALWARE-TRAFFIC-ANALYSIS.NET and normal traffic from UNSW-NB 15. Experimental results show that the detection speed is about 1859 HTTP traffic per second, and the detection accuracy reaches 98.72%, and the false positive rate is less than 1%.
ISSN:	1939-0114 1939-0122
DOI:	10.1155/2019/2182615