Research on the Construction of Malware Variant Datasets and Their Detection Method

Malware detection is of great significance for maintaining the security of information systems. Malware obfuscation techniques and malware variants are increasingly emerging, but their samples and API (application programming interface) sequences are difficult to obtain. This poses difficulties for...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied sciences 2022-08, Vol.12 (15), p.7546
Hauptverfasser: Lu, Faming, Cai, Zhaoyang, Lin, Zedong, Bao, Yunxia, Tang, Mengfan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Malware detection is of great significance for maintaining the security of information systems. Malware obfuscation techniques and malware variants are increasingly emerging, but their samples and API (application programming interface) sequences are difficult to obtain. This poses difficulties for the development of malware variant detection models. To address this issue in this paper, we first generated a malware variant dataset using the obfuscation technique based on the disassembly and decompilation of malware. Then, an API call dataset of these malware variants was constructed through sandboxing. Compared to similar work, the malware variants and their obfuscated API call sequences generated in this paper were all runnable. After that, taking a public API call sequence dataset of obfuscation-free malware as input, a BERT (bidirectional encoder representation from transformers) pretrained model for malware detection was constructed. To enhance the ability of this pretrained model to handle obfuscation and variants, in this paper, we used adversarial training to improve the robustness and generalization of the detection model under obfuscation. As the experimental results show, the proposed scheme can improve the classification performance of malware variants under obfuscation. The accuracy of the malware variant classification was close to that of the unobfuscated case.
ISSN:2076-3417
2076-3417
DOI:10.3390/app12157546