A Lightweight Android Malware Classifier Using Novel Feature Selection Methods

Smartphones and mobile tablets play significant roles in daily life and have led to an increase in the number of users of this technology. The rising number of mobile device end-users has resulted in the generation of malware by hackers. Thus, mobile devices are becoming vulnerable to malware. Machi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Symmetry (Basel) 2020-05, Vol.12 (5), p.858
Hauptverfasser: Salah, Ahmad, Shalabi, Eman, Khedr, Walid
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Smartphones and mobile tablets play significant roles in daily life and have led to an increase in the number of users of this technology. The rising number of mobile device end-users has resulted in the generation of malware by hackers. Thus, mobile devices are becoming vulnerable to malware. Machine learning plays an important role in the detection of mobile malware applications. In this study, we focus on static analysis for Android malware detection. The ultimate goal of this research is to find out the symmetric features across the malware Android application to easily detect them. Many state-of-the-art methods focus on extracting asymmetric patterns of the category of features, e.g., application permissions to distinguish the malware application from the benign application. In this work, we propose a compromise by considering different types of static features and select the most important features that affect the detection process. These features represent the symmetric pattern to be used for the classification task. Inspired by TF-IDF, we propose a novel method of feature selection. Moreover, we propose a new method for merging the Android application URLs into a single feature called the URL_score. Several linear machine learning classifiers are utilized to evaluate the proposed method. The proposed methods significantly reduce the feature space, i.e., the symmetric pattern, of the Android application dataset and the memory size of the final model. In addition, the proposed model achieves the highest reported accuracy for the Drebin dataset to date. Based on the evaluation results, the linear support vector machine achieves an accuracy of 99%.
ISSN:2073-8994
2073-8994
DOI:10.3390/sym12050858