Performance-Oriented and Sustainability-Oriented Design of an Effective Android Malware Detector
Effective Android malware detection is a complex problem because of the rapidly-evolving, complicated, and diverse nature of malware. The design of malware detectors should prioritise high detection rate, efficient use of computational resources, and sustainability. Keeping these design priorities i...
Gespeichert in:
Veröffentlicht in: | IEEE access 2024, Vol.12, p.159036-159055 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Effective Android malware detection is a complex problem because of the rapidly-evolving, complicated, and diverse nature of malware. The design of malware detectors should prioritise high detection rate, efficient use of computational resources, and sustainability. Keeping these design priorities in mind, we develop and empirically evaluate four different classifiers. Firstly, to ensure high detection rate, we use a dataset compiled using hybrid analysis of a diverse set of apps. Unlike most publicly-available Android datasets, the dynamic analysis of each app was carried out on a real device and not on a virtual setup. This means that this dataset contains a comprehensive profile of sophisticated malware capable of changing its behaviour on a virtual setup. Secondly, to enhance efficiency, we explore the use of a GPU-based setup and different feature selection techniques. Lastly, we emphasize sustainability by training the models using apps that date back to the beginning of the Android ecosystem i.e. from 2008 until 2020. Our results show that Random Forest (RF) is the most effective classifier with the highest accuracy of 97.86%. This accuracy is 2.78% higher than the best accuracy reported in existing literature. The data also shows that RF is the most sustainable classifier with minimal decrease in F1 score for over-time performance. With regard to efficiency, we find that Logistic Regression (LR) is the best option and that the training time of most models improves significantly when a GPU-based setup instead of a CPU-based setup. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2024.3486094 |