Security Relevant Methods of Android's API Classification: A Machine Learning Empirical Evaluation

The Android operating system provides functions and methods to handle sensitive data to secure users' data. The Android security literature extracts binary features from a method and classifies the method into one of the Security Relevant Method's classes, adding information about how the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on computers 2023-11, Vol.72 (11), p.1-13
Hauptverfasser: Rodrigues, Walber M., Walmsley, Felipe N., Cavalcanti, George D. C., Cruz, Rafael M. O.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 13
container_issue 11
container_start_page 1
container_title IEEE transactions on computers
container_volume 72
creator Rodrigues, Walber M.
Walmsley, Felipe N.
Cavalcanti, George D. C.
Cruz, Rafael M. O.
description The Android operating system provides functions and methods to handle sensitive data to secure users' data. The Android security literature extracts binary features from a method and classifies the method into one of the Security Relevant Method's classes, adding information about how the method handles sensitive data. However, the usage of binary features hinders the performance of some classifiers due to the high collision rate between instances. Although previous works have explored Security Relevant Method classification, an extensive study of machine learning algorithms over this problem has not been conceived. This work fills this gap, analyzing Monolithic classifiers, Multiple Classifier Systems, and Embedding algorithms to transform binary features into real-valued features, aiming to facilitate the classifier's work by minimizing the ambiguity promoted by the collision. Our analyzes show that META-DES, using a pool of Decision Trees trained with the Random Forest algorithm, statistically has the best results. We also find that, in general, distance-based classifiers have a disadvantage in binary features. Moreover, embedding techniques such as deep metric learning with triplet loss can reduce geometrical instance ambiguity, improving the performance of the weakest learning algorithms. However, its usage was detrimental to the performance of more robust techniques, such as dynamic ensemble models better suited for handling difficult cases. The dataset and code used for the experiments are available in the following repository: https://github.com/walbermr/android-srm-ml-evaluation .
doi_str_mv 10.1109/TC.2023.3291998
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2875578352</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10183829</ieee_id><sourcerecordid>2875578352</sourcerecordid><originalsourceid>FETCH-LOGICAL-c244t-89b1ab546bcba02e8ef4ca2f5346cb5c64410b895ed2e5892679ba65d71779413</originalsourceid><addsrcrecordid>eNpNkD1PwzAARC0EEqUwszBYYmBK68_EZouiApVagaDMlu041FWaFDup1H9PShmYbnl3Jz0AbjGaYIzkdFVMCCJ0QonEUoozMMKcZ4mUPD0HI4SwSCRl6BJcxbhBCKUEyREwH872wXcH-O5qt9dNB5euW7dlhG0F86YMrS8fIszf5rCodYy-8lZ3vm0eYQ6X2q594-DC6dD45gvOtjsfBqCGs72u-1_wGlxUuo7u5i_H4PNptipeksXr87zIF4kljHWJkAZrw1lqrNGIOOEqZjWpOGWpNdymjGFkhOSuJI4LSdJMGp3yMsNZJhmmY3B_2t2F9rt3sVObtg_NcKmIyAYXgnIyUNMTZUMbY3CV2gW_1eGgMFJHkWpVqKNI9SdyaNydGt4594_Gggoi6Q_2dW4q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2875578352</pqid></control><display><type>article</type><title>Security Relevant Methods of Android's API Classification: A Machine Learning Empirical Evaluation</title><source>IEEE Electronic Library (IEL)</source><creator>Rodrigues, Walber M. ; Walmsley, Felipe N. ; Cavalcanti, George D. C. ; Cruz, Rafael M. O.</creator><creatorcontrib>Rodrigues, Walber M. ; Walmsley, Felipe N. ; Cavalcanti, George D. C. ; Cruz, Rafael M. O.</creatorcontrib><description>The Android operating system provides functions and methods to handle sensitive data to secure users' data. The Android security literature extracts binary features from a method and classifies the method into one of the Security Relevant Method's classes, adding information about how the method handles sensitive data. However, the usage of binary features hinders the performance of some classifiers due to the high collision rate between instances. Although previous works have explored Security Relevant Method classification, an extensive study of machine learning algorithms over this problem has not been conceived. This work fills this gap, analyzing Monolithic classifiers, Multiple Classifier Systems, and Embedding algorithms to transform binary features into real-valued features, aiming to facilitate the classifier's work by minimizing the ambiguity promoted by the collision. Our analyzes show that META-DES, using a pool of Decision Trees trained with the Random Forest algorithm, statistically has the best results. We also find that, in general, distance-based classifiers have a disadvantage in binary features. Moreover, embedding techniques such as deep metric learning with triplet loss can reduce geometrical instance ambiguity, improving the performance of the weakest learning algorithms. However, its usage was detrimental to the performance of more robust techniques, such as dynamic ensemble models better suited for handling difficult cases. The dataset and code used for the experiments are available in the following repository: https://github.com/walbermr/android-srm-ml-evaluation .</description><identifier>ISSN: 0018-9340</identifier><identifier>EISSN: 1557-9956</identifier><identifier>DOI: 10.1109/TC.2023.3291998</identifier><identifier>CODEN: ITCOB4</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Ambiguity ; Android security ; API Classification ; Binary features ; Classification ; Classification algorithms ; Classifiers ; Codes ; Collision rates ; Decision analysis ; Decision trees ; Embedding ; Empirical analysis ; Feature extraction ; Handles ; Machine learning ; Machine learning algorithms ; Multiple Classifier Systems ; Pipelines ; Security</subject><ispartof>IEEE transactions on computers, 2023-11, Vol.72 (11), p.1-13</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c244t-89b1ab546bcba02e8ef4ca2f5346cb5c64410b895ed2e5892679ba65d71779413</cites><orcidid>0000-0001-9446-1040 ; 0000-0003-1410-1059 ; 0000-0001-7714-2283 ; 0000-0002-8809-6304</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10183829$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27903,27904,54736</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10183829$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Rodrigues, Walber M.</creatorcontrib><creatorcontrib>Walmsley, Felipe N.</creatorcontrib><creatorcontrib>Cavalcanti, George D. C.</creatorcontrib><creatorcontrib>Cruz, Rafael M. O.</creatorcontrib><title>Security Relevant Methods of Android's API Classification: A Machine Learning Empirical Evaluation</title><title>IEEE transactions on computers</title><addtitle>TC</addtitle><description>The Android operating system provides functions and methods to handle sensitive data to secure users' data. The Android security literature extracts binary features from a method and classifies the method into one of the Security Relevant Method's classes, adding information about how the method handles sensitive data. However, the usage of binary features hinders the performance of some classifiers due to the high collision rate between instances. Although previous works have explored Security Relevant Method classification, an extensive study of machine learning algorithms over this problem has not been conceived. This work fills this gap, analyzing Monolithic classifiers, Multiple Classifier Systems, and Embedding algorithms to transform binary features into real-valued features, aiming to facilitate the classifier's work by minimizing the ambiguity promoted by the collision. Our analyzes show that META-DES, using a pool of Decision Trees trained with the Random Forest algorithm, statistically has the best results. We also find that, in general, distance-based classifiers have a disadvantage in binary features. Moreover, embedding techniques such as deep metric learning with triplet loss can reduce geometrical instance ambiguity, improving the performance of the weakest learning algorithms. However, its usage was detrimental to the performance of more robust techniques, such as dynamic ensemble models better suited for handling difficult cases. The dataset and code used for the experiments are available in the following repository: https://github.com/walbermr/android-srm-ml-evaluation .</description><subject>Algorithms</subject><subject>Ambiguity</subject><subject>Android security</subject><subject>API Classification</subject><subject>Binary features</subject><subject>Classification</subject><subject>Classification algorithms</subject><subject>Classifiers</subject><subject>Codes</subject><subject>Collision rates</subject><subject>Decision analysis</subject><subject>Decision trees</subject><subject>Embedding</subject><subject>Empirical analysis</subject><subject>Feature extraction</subject><subject>Handles</subject><subject>Machine learning</subject><subject>Machine learning algorithms</subject><subject>Multiple Classifier Systems</subject><subject>Pipelines</subject><subject>Security</subject><issn>0018-9340</issn><issn>1557-9956</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkD1PwzAARC0EEqUwszBYYmBK68_EZouiApVagaDMlu041FWaFDup1H9PShmYbnl3Jz0AbjGaYIzkdFVMCCJ0QonEUoozMMKcZ4mUPD0HI4SwSCRl6BJcxbhBCKUEyREwH872wXcH-O5qt9dNB5euW7dlhG0F86YMrS8fIszf5rCodYy-8lZ3vm0eYQ6X2q594-DC6dD45gvOtjsfBqCGs72u-1_wGlxUuo7u5i_H4PNptipeksXr87zIF4kljHWJkAZrw1lqrNGIOOEqZjWpOGWpNdymjGFkhOSuJI4LSdJMGp3yMsNZJhmmY3B_2t2F9rt3sVObtg_NcKmIyAYXgnIyUNMTZUMbY3CV2gW_1eGgMFJHkWpVqKNI9SdyaNydGt4594_Gggoi6Q_2dW4q</recordid><startdate>20231101</startdate><enddate>20231101</enddate><creator>Rodrigues, Walber M.</creator><creator>Walmsley, Felipe N.</creator><creator>Cavalcanti, George D. C.</creator><creator>Cruz, Rafael M. O.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-9446-1040</orcidid><orcidid>https://orcid.org/0000-0003-1410-1059</orcidid><orcidid>https://orcid.org/0000-0001-7714-2283</orcidid><orcidid>https://orcid.org/0000-0002-8809-6304</orcidid></search><sort><creationdate>20231101</creationdate><title>Security Relevant Methods of Android's API Classification: A Machine Learning Empirical Evaluation</title><author>Rodrigues, Walber M. ; Walmsley, Felipe N. ; Cavalcanti, George D. C. ; Cruz, Rafael M. O.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c244t-89b1ab546bcba02e8ef4ca2f5346cb5c64410b895ed2e5892679ba65d71779413</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Ambiguity</topic><topic>Android security</topic><topic>API Classification</topic><topic>Binary features</topic><topic>Classification</topic><topic>Classification algorithms</topic><topic>Classifiers</topic><topic>Codes</topic><topic>Collision rates</topic><topic>Decision analysis</topic><topic>Decision trees</topic><topic>Embedding</topic><topic>Empirical analysis</topic><topic>Feature extraction</topic><topic>Handles</topic><topic>Machine learning</topic><topic>Machine learning algorithms</topic><topic>Multiple Classifier Systems</topic><topic>Pipelines</topic><topic>Security</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rodrigues, Walber M.</creatorcontrib><creatorcontrib>Walmsley, Felipe N.</creatorcontrib><creatorcontrib>Cavalcanti, George D. C.</creatorcontrib><creatorcontrib>Cruz, Rafael M. O.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on computers</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Rodrigues, Walber M.</au><au>Walmsley, Felipe N.</au><au>Cavalcanti, George D. C.</au><au>Cruz, Rafael M. O.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Security Relevant Methods of Android's API Classification: A Machine Learning Empirical Evaluation</atitle><jtitle>IEEE transactions on computers</jtitle><stitle>TC</stitle><date>2023-11-01</date><risdate>2023</risdate><volume>72</volume><issue>11</issue><spage>1</spage><epage>13</epage><pages>1-13</pages><issn>0018-9340</issn><eissn>1557-9956</eissn><coden>ITCOB4</coden><abstract>The Android operating system provides functions and methods to handle sensitive data to secure users' data. The Android security literature extracts binary features from a method and classifies the method into one of the Security Relevant Method's classes, adding information about how the method handles sensitive data. However, the usage of binary features hinders the performance of some classifiers due to the high collision rate between instances. Although previous works have explored Security Relevant Method classification, an extensive study of machine learning algorithms over this problem has not been conceived. This work fills this gap, analyzing Monolithic classifiers, Multiple Classifier Systems, and Embedding algorithms to transform binary features into real-valued features, aiming to facilitate the classifier's work by minimizing the ambiguity promoted by the collision. Our analyzes show that META-DES, using a pool of Decision Trees trained with the Random Forest algorithm, statistically has the best results. We also find that, in general, distance-based classifiers have a disadvantage in binary features. Moreover, embedding techniques such as deep metric learning with triplet loss can reduce geometrical instance ambiguity, improving the performance of the weakest learning algorithms. However, its usage was detrimental to the performance of more robust techniques, such as dynamic ensemble models better suited for handling difficult cases. The dataset and code used for the experiments are available in the following repository: https://github.com/walbermr/android-srm-ml-evaluation .</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TC.2023.3291998</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0001-9446-1040</orcidid><orcidid>https://orcid.org/0000-0003-1410-1059</orcidid><orcidid>https://orcid.org/0000-0001-7714-2283</orcidid><orcidid>https://orcid.org/0000-0002-8809-6304</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0018-9340
ispartof IEEE transactions on computers, 2023-11, Vol.72 (11), p.1-13
issn 0018-9340
1557-9956
language eng
recordid cdi_proquest_journals_2875578352
source IEEE Electronic Library (IEL)
subjects Algorithms
Ambiguity
Android security
API Classification
Binary features
Classification
Classification algorithms
Classifiers
Codes
Collision rates
Decision analysis
Decision trees
Embedding
Empirical analysis
Feature extraction
Handles
Machine learning
Machine learning algorithms
Multiple Classifier Systems
Pipelines
Security
title Security Relevant Methods of Android's API Classification: A Machine Learning Empirical Evaluation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T18%3A34%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Security%20Relevant%20Methods%20of%20Android's%20API%20Classification:%20A%20Machine%20Learning%20Empirical%20Evaluation&rft.jtitle=IEEE%20transactions%20on%20computers&rft.au=Rodrigues,%20Walber%20M.&rft.date=2023-11-01&rft.volume=72&rft.issue=11&rft.spage=1&rft.epage=13&rft.pages=1-13&rft.issn=0018-9340&rft.eissn=1557-9956&rft.coden=ITCOB4&rft_id=info:doi/10.1109/TC.2023.3291998&rft_dat=%3Cproquest_RIE%3E2875578352%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2875578352&rft_id=info:pmid/&rft_ieee_id=10183829&rfr_iscdi=true