Deep Learning and Visualization for Identifying Malware Families

The growing threat of malware is becoming more and more difficult to ignore. In this paper, a malware feature images generation method is used to combine the static analysis of malicious code with the methods of recurrent neural networks (RNN) and convolutional neural networks (CNN). By using an RNN...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on dependable and secure computing 2021-01, Vol.18 (1), p.283-295
Hauptverfasser: Sun, Guosong, Qian, Quan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The growing threat of malware is becoming more and more difficult to ignore. In this paper, a malware feature images generation method is used to combine the static analysis of malicious code with the methods of recurrent neural networks (RNN) and convolutional neural networks (CNN). By using an RNN, our method considers not only the original information of malware but also the ability to associate the original code with timing characteristics; furthermore, the process reduces the dependence on category labels of malware. Then, we use minhash to generate feature images from the fusion of the original codes and the predictive codes from the RNN. Finally, we train a CNN to classify feature images. When we trained very few samples (the proportion of the sample size of training dataset to validation dataset was 1:30), we obtained accuracy over 92 percent. When we adjust the proportion to 3:1, the accuracy exceeds 99.5 percent. As shown in confusion matrices, our method obtains a good result, where the worst false positive rate of all the malware families is 0.0147 and the average false positive rate is 0.0058.
ISSN:1545-5971
1941-0018
DOI:10.1109/TDSC.2018.2884928