Compact feature hashing for machine learning based malware detection

Machine learning can detect variant malware files that can evade signature-based detection. Feature hashing is used to convert features into a fixed-length vector. In this paper, we study the appropriate vector size for feature hashing for a large dataset of malware files. Through exhaustive experim...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ICT express 2022, 8(1), , pp.124-129
Hauptverfasser:	Moon, Damin, Lee, JaeKoo, Yoon, MyungKeun
Format:	Artikel
Sprache:	eng
Schlagworte:	Feature hashing Feature vector Machine learning Malware detection 전자/정보통신공학
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Machine learning can detect variant malware files that can evade signature-based detection. Feature hashing is used to convert features into a fixed-length vector. In this paper, we study the appropriate vector size for feature hashing for a large dataset of malware files. Through exhaustive experiments on more than 280,000 real malware and benign files, we find for the first time that the default vector size of current feature hashing practices is unnecessarily large. We experimentally explore the appropriate vector size, which not only reduces memory space by 70% but also increases the detection accuracy, compared with the state-of-the-art scheme.
ISSN:	2405-9595 2405-9595
DOI:	10.1016/j.icte.2021.08.005