N-gram MalGAN: Evading machine learning detection via feature n-gram

In recent years, many adversarial malware examples with different feature strategies, especially GAN and its variants, have been introduced to handle the security threats, e.g., evading the detection of machine learning detectors. However, these solutions still suffer from problems of complicated de...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Digital communications and networks 2022-08, Vol.8 (4), p.485-491
Hauptverfasser:	Zhu, Enmin, Zhang, Jianjie, Yan, Jijie, Chen, Kongyang, Gao, Chongzhi
Format:	Artikel
Sprache:	eng
Schlagworte:	Adversarial examples Machine learning MalGAN N-gram
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In recent years, many adversarial malware examples with different feature strategies, especially GAN and its variants, have been introduced to handle the security threats, e.g., evading the detection of machine learning detectors. However, these solutions still suffer from problems of complicated deployment or long running time. In this paper, we propose an n-gram MalGAN method to solve these problems. We borrow the idea of n-gram from the Natural Language Processing (NLP) area to expand feature sources for adversarial malware examples in MalGAN. Generally, the n-gram MalGAN obtains the feature vector directly from the hexadecimal bytecodes of the executable file. It can be implemented easily and conveniently with a simple program language (e.g., C++), with no need for any prior knowledge of the executable file or any professional feature extraction tools. These features are functionally independent and thus can be added to the non-functional area of the malicious program to maintain its original executability. In this way, the n-gram could make the adversarial attack easier and more convenient. Experimental results show that the evasion rate of the n-gram MalGAN is at least 88.58% to attack different machine learning algorithms under an appropriate group rate, growing to even 100% for the Random Forest algorithm.
ISSN:	2352-8648 2352-8648
DOI:	10.1016/j.dcan.2021.11.007