Deep Learning and Visualization for Identifying Malware Families

The growing threat of malware is becoming more and more difficult to ignore. In this paper, a malware feature images generation method is used to combine the static analysis of malicious code with the methods of recurrent neural networks (RNN) and convolutional neural networks (CNN). By using an RNN...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on dependable and secure computing 2021-01, Vol.18 (1), p.283-295
Hauptverfasser:	Sun, Guosong, Qian, Quan
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Computer Science Computer Science, Hardware & Architecture Computer Science, Information Systems Computer Science, Software Engineering convolutional neural network Datasets Deep learning Feature extraction Image classification Malware Malware family identification malware feature image Neural networks recurrent neural network Recurrent neural networks Science & Technology Static analysis Static code analysis Technology Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The growing threat of malware is becoming more and more difficult to ignore. In this paper, a malware feature images generation method is used to combine the static analysis of malicious code with the methods of recurrent neural networks (RNN) and convolutional neural networks (CNN). By using an RNN, our method considers not only the original information of malware but also the ability to associate the original code with timing characteristics; furthermore, the process reduces the dependence on category labels of malware. Then, we use minhash to generate feature images from the fusion of the original codes and the predictive codes from the RNN. Finally, we train a CNN to classify feature images. When we trained very few samples (the proportion of the sample size of training dataset to validation dataset was 1:30), we obtained accuracy over 92 percent. When we adjust the proportion to 3:1, the accuracy exceeds 99.5 percent. As shown in confusion matrices, our method obtains a good result, where the worst false positive rate of all the malware families is 0.0147 and the average false positive rate is 0.0058.
ISSN:	1545-5971 1941-0018
DOI:	10.1109/TDSC.2018.2884928