Auto-detection of sophisticated malware using lazy-binding control flow graph and deep learning

To date, industrial antivirus tools are mostly using signature-based methods to detect malware occurrences. However, sophisticated malware, such as metamorphic or polymorphic virus, can effectively evade those tools by using some advanced obfuscation techniques, including mutation and the dynamicall...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computers & security 2018-07, Vol.76, p.128-155
Hauptverfasser: Nguyen, Minh Hai, Nguyen, Dung Le, Nguyen, Xuan Mao, Quan, Tho Thanh
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:To date, industrial antivirus tools are mostly using signature-based methods to detect malware occurrences. However, sophisticated malware, such as metamorphic or polymorphic virus, can effectively evade those tools by using some advanced obfuscation techniques, including mutation and the dynamically executed contents (DEC) methods, which dynamically produce new executable code in the run-time. Common DEC methods used by malware programs are packing or calling external code. In the research community, the approach of program analysis to detect suspicious behaviors has been emerging recently to handle this problem. Control flow graph (CFG) is a suitable representation to capture common behaviors from various mutated samples of virus. However, the current typical CFG forms generated by state-of-the-art binary analysis tools, such as IDA Pro, do not precisely reflect the behaviors of DEC methods. Moreover, this approach suffers from an extremely heavy cost to conduct and analyze the CFGs from binaries. This drawback causes the method of formal behavior analysis to be virtually not applicable with real-world applications. In this paper, we propose an enhanced form of CFG, known as lazy-binding CFG to reflect the DEC behaviors. Then, with the recent advancement of the deep learning techniques, we present a method of producing image-based representation from the generated CFG. As deep learning is very popular to perform image classification on very large dataset, our proposed technique can be applied for malware detection on real-world computer programs and thus enjoying very high accuracy. We also illustrate our analysis results with some well-known malware samples, including WannaCry, Kasperagent and Sality, one of the most sophisticated polymorphic viruses.
ISSN:0167-4048
1872-6208
DOI:10.1016/j.cose.2018.02.006