Compact feature hashing for machine learning based malware detection

Machine learning can detect variant malware files that can evade signature-based detection. Feature hashing is used to convert features into a fixed-length vector. In this paper, we study the appropriate vector size for feature hashing for a large dataset of malware files. Through exhaustive experim...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ICT express 2022, 8(1), , pp.124-129
Hauptverfasser: Moon, Damin, Lee, JaeKoo, Yoon, MyungKeun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Machine learning can detect variant malware files that can evade signature-based detection. Feature hashing is used to convert features into a fixed-length vector. In this paper, we study the appropriate vector size for feature hashing for a large dataset of malware files. Through exhaustive experiments on more than 280,000 real malware and benign files, we find for the first time that the default vector size of current feature hashing practices is unnecessarily large. We experimentally explore the appropriate vector size, which not only reduces memory space by 70% but also increases the detection accuracy, compared with the state-of-the-art scheme.
ISSN:2405-9595
2405-9595
DOI:10.1016/j.icte.2021.08.005