A deep learning approach for detecting malicious JavaScript code

Malicious JavaScript code in webpages on the Internet is an emergent security issue because of its universality and potentially severe impact. Because of its obfuscation and complexities, detecting it has a considerable cost. Over the last few years, several machine learning‐based detection approach...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Security and communication networks 2016-07, Vol.9 (11), p.1520-1534
Hauptverfasser:	Wang, Yao, Cai, Wan‐dong, Wei, Peng‐cheng
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Architecture Classifiers deep learning Java (programming language) JavaScript attacks Learning logistic regression Logistics random projection Regression SdA Security static analysis
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Malicious JavaScript code in webpages on the Internet is an emergent security issue because of its universality and potentially severe impact. Because of its obfuscation and complexities, detecting it has a considerable cost. Over the last few years, several machine learning‐based detection approaches have been proposed; most of them use shallow discriminating models with features that are constructed with artificial rules. However, with the advent of the big data era for information transmission, these existing methods already cannot satisfy actual needs. In this paper, we present a new deep learning framework for detection of malicious JavaScript code, from which we obtained the highest detection accuracy compared with the control group. The architecture is composed of a sparse random projection, deep learning model, and logistic regression. Stacked denoising auto‐encoders were used to extract high‐level features from JavaScript code; logistic regression as a classifier was used to distinguish between malicious and benign JavaScript code. Experimental results indicated that our architecture, with over 27 000 labeled samples, can achieve an accuracy of up to 95%, with a false positive rate less than 4.2% in the best case. Copyright © 2016 John Wiley & Sons, Ltd. Most of the machine learning‐based approaches for detecting malicious JavaScript code depend on manually designed features. This paper proposed a deep learning‐based approach to analyze JavaScript code features automatically with little manual intervention. By using the learned features from our deep learning framework, a logistic regression classifier can efficiently detect malicious JavaScript code and has sufficient capacity to discover unknown attacks.
ISSN:	1939-0114 1939-0122
DOI:	10.1002/sec.1441