SUSPEND: Determining software suspiciousness by non-stationary time series modeling of entropy signals

•Software entropy is traditionally used for packer detection.•Here, software entropy is represented as a non-stationary time series.•Features are extracted using wavelets, change point models, and detrended fluctuation analysis.•These features improve large-scale discrimination between malicious and...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2017-04, Vol.71, p.301-318
Hauptverfasser: Wojnowicz, Michael, Chisholm, Glenn, Wallace, Brian, Wolff, Matt, Zhao, Xuan, Luan, Jay
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Software entropy is traditionally used for packer detection.•Here, software entropy is represented as a non-stationary time series.•Features are extracted using wavelets, change point models, and detrended fluctuation analysis.•These features improve large-scale discrimination between malicious and clean files. Commercial anti-virus software traditionally memorizes specific byte sequences (known as “signatures”) in the file contents of previously encountered malware. However, malware authors can evade signature-based detection in many ways; for instance, by using obfuscation techniques such as “packing” (encryption or compression) to hide snippets of malicious code; by writing metamorphic malware; or by tampering with existing malware. We hypothesize that certain evasion techniques can leave traces in the file’s entropy signal, revealing either similarities to known malware or the presence of tampering per se. To this end, we present SUSPEND (SUSPicious ENtropy signal Detector), an expert system which evaluates the suspiciousness of an executable file’s entropy signal in order to subserve malware classification. Whereas traditionally, entropy analysis has been used for the goal of packer detection (and therefore entropy-based features often merely comprise mean entropy or the entropy of a few file subcomponents), SUSPEND applies non-stationary time series modeling to aid in malware detection. In particular, SUSPEND (a) quantifies the “amount of structure” in the entropy signal (through detrended fluctuation analysis), (b) finds the location and size of sudden jumps in entropy (through mean change point modeling), and (c) computes the distribution of entropic variation across multiple spatial scales (through wavelet decomposition). In addition, SUSPEND (d) summarizes the entropy signal’s empirical probability distribution. Because SUSPEND’s run time can be made to scale linearly in file size, it is well-suited for large-scale malware analysis. We apply SUSPEND to a large-scale malware detection task with 500,000 heterogeneous real-world samples and over 1 million features. We find that SUSPEND boosts the predictive performance of traditional entropy analysis (as found in packer detectors) from 77.02% to 96.62%. Moreover, SUSPEND’s focus on entropy signals makes it a natural candidate for combining with other types of features; for instance, combining SUSPEND with a strings-based feature set boosts predictive accuracy from 97.18% to 98.62%. Thus, whereas tr
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2016.11.027