Security Relevant Methods of Android's API Classification: A Machine Learning Empirical Evaluation

The Android operating system provides functions and methods to handle sensitive data to secure users' data. The Android security literature extracts binary features from a method and classifies the method into one of the Security Relevant Method's classes, adding information about how the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on computers 2023-11, Vol.72 (11), p.1-13
Hauptverfasser: Rodrigues, Walber M., Walmsley, Felipe N., Cavalcanti, George D. C., Cruz, Rafael M. O.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The Android operating system provides functions and methods to handle sensitive data to secure users' data. The Android security literature extracts binary features from a method and classifies the method into one of the Security Relevant Method's classes, adding information about how the method handles sensitive data. However, the usage of binary features hinders the performance of some classifiers due to the high collision rate between instances. Although previous works have explored Security Relevant Method classification, an extensive study of machine learning algorithms over this problem has not been conceived. This work fills this gap, analyzing Monolithic classifiers, Multiple Classifier Systems, and Embedding algorithms to transform binary features into real-valued features, aiming to facilitate the classifier's work by minimizing the ambiguity promoted by the collision. Our analyzes show that META-DES, using a pool of Decision Trees trained with the Random Forest algorithm, statistically has the best results. We also find that, in general, distance-based classifiers have a disadvantage in binary features. Moreover, embedding techniques such as deep metric learning with triplet loss can reduce geometrical instance ambiguity, improving the performance of the weakest learning algorithms. However, its usage was detrimental to the performance of more robust techniques, such as dynamic ensemble models better suited for handling difficult cases. The dataset and code used for the experiments are available in the following repository: https://github.com/walbermr/android-srm-ml-evaluation .
ISSN:0018-9340
1557-9956
DOI:10.1109/TC.2023.3291998