Security Relevant Methods of Android's API Classification: A Machine Learning Empirical Evaluation
The Android operating system provides functions and methods to handle sensitive data to secure users' data. The Android security literature extracts binary features from a method and classifies the method into one of the Security Relevant Method's classes, adding information about how the...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on computers 2023-11, Vol.72 (11), p.1-13 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 13 |
---|---|
container_issue | 11 |
container_start_page | 1 |
container_title | IEEE transactions on computers |
container_volume | 72 |
creator | Rodrigues, Walber M. Walmsley, Felipe N. Cavalcanti, George D. C. Cruz, Rafael M. O. |
description | The Android operating system provides functions and methods to handle sensitive data to secure users' data. The Android security literature extracts binary features from a method and classifies the method into one of the Security Relevant Method's classes, adding information about how the method handles sensitive data. However, the usage of binary features hinders the performance of some classifiers due to the high collision rate between instances. Although previous works have explored Security Relevant Method classification, an extensive study of machine learning algorithms over this problem has not been conceived. This work fills this gap, analyzing Monolithic classifiers, Multiple Classifier Systems, and Embedding algorithms to transform binary features into real-valued features, aiming to facilitate the classifier's work by minimizing the ambiguity promoted by the collision. Our analyzes show that META-DES, using a pool of Decision Trees trained with the Random Forest algorithm, statistically has the best results. We also find that, in general, distance-based classifiers have a disadvantage in binary features. Moreover, embedding techniques such as deep metric learning with triplet loss can reduce geometrical instance ambiguity, improving the performance of the weakest learning algorithms. However, its usage was detrimental to the performance of more robust techniques, such as dynamic ensemble models better suited for handling difficult cases. The dataset and code used for the experiments are available in the following repository: https://github.com/walbermr/android-srm-ml-evaluation . |
doi_str_mv | 10.1109/TC.2023.3291998 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2875578352</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10183829</ieee_id><sourcerecordid>2875578352</sourcerecordid><originalsourceid>FETCH-LOGICAL-c244t-89b1ab546bcba02e8ef4ca2f5346cb5c64410b895ed2e5892679ba65d71779413</originalsourceid><addsrcrecordid>eNpNkD1PwzAARC0EEqUwszBYYmBK68_EZouiApVagaDMlu041FWaFDup1H9PShmYbnl3Jz0AbjGaYIzkdFVMCCJ0QonEUoozMMKcZ4mUPD0HI4SwSCRl6BJcxbhBCKUEyREwH872wXcH-O5qt9dNB5euW7dlhG0F86YMrS8fIszf5rCodYy-8lZ3vm0eYQ6X2q594-DC6dD45gvOtjsfBqCGs72u-1_wGlxUuo7u5i_H4PNptipeksXr87zIF4kljHWJkAZrw1lqrNGIOOEqZjWpOGWpNdymjGFkhOSuJI4LSdJMGp3yMsNZJhmmY3B_2t2F9rt3sVObtg_NcKmIyAYXgnIyUNMTZUMbY3CV2gW_1eGgMFJHkWpVqKNI9SdyaNydGt4594_Gggoi6Q_2dW4q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2875578352</pqid></control><display><type>article</type><title>Security Relevant Methods of Android's API Classification: A Machine Learning Empirical Evaluation</title><source>IEEE Electronic Library (IEL)</source><creator>Rodrigues, Walber M. ; Walmsley, Felipe N. ; Cavalcanti, George D. C. ; Cruz, Rafael M. O.</creator><creatorcontrib>Rodrigues, Walber M. ; Walmsley, Felipe N. ; Cavalcanti, George D. C. ; Cruz, Rafael M. O.</creatorcontrib><description>The Android operating system provides functions and methods to handle sensitive data to secure users' data. The Android security literature extracts binary features from a method and classifies the method into one of the Security Relevant Method's classes, adding information about how the method handles sensitive data. However, the usage of binary features hinders the performance of some classifiers due to the high collision rate between instances. Although previous works have explored Security Relevant Method classification, an extensive study of machine learning algorithms over this problem has not been conceived. This work fills this gap, analyzing Monolithic classifiers, Multiple Classifier Systems, and Embedding algorithms to transform binary features into real-valued features, aiming to facilitate the classifier's work by minimizing the ambiguity promoted by the collision. Our analyzes show that META-DES, using a pool of Decision Trees trained with the Random Forest algorithm, statistically has the best results. We also find that, in general, distance-based classifiers have a disadvantage in binary features. Moreover, embedding techniques such as deep metric learning with triplet loss can reduce geometrical instance ambiguity, improving the performance of the weakest learning algorithms. However, its usage was detrimental to the performance of more robust techniques, such as dynamic ensemble models better suited for handling difficult cases. The dataset and code used for the experiments are available in the following repository: https://github.com/walbermr/android-srm-ml-evaluation .</description><identifier>ISSN: 0018-9340</identifier><identifier>EISSN: 1557-9956</identifier><identifier>DOI: 10.1109/TC.2023.3291998</identifier><identifier>CODEN: ITCOB4</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Ambiguity ; Android security ; API Classification ; Binary features ; Classification ; Classification algorithms ; Classifiers ; Codes ; Collision rates ; Decision analysis ; Decision trees ; Embedding ; Empirical analysis ; Feature extraction ; Handles ; Machine learning ; Machine learning algorithms ; Multiple Classifier Systems ; Pipelines ; Security</subject><ispartof>IEEE transactions on computers, 2023-11, Vol.72 (11), p.1-13</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c244t-89b1ab546bcba02e8ef4ca2f5346cb5c64410b895ed2e5892679ba65d71779413</cites><orcidid>0000-0001-9446-1040 ; 0000-0003-1410-1059 ; 0000-0001-7714-2283 ; 0000-0002-8809-6304</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10183829$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27903,27904,54736</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10183829$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Rodrigues, Walber M.</creatorcontrib><creatorcontrib>Walmsley, Felipe N.</creatorcontrib><creatorcontrib>Cavalcanti, George D. C.</creatorcontrib><creatorcontrib>Cruz, Rafael M. O.</creatorcontrib><title>Security Relevant Methods of Android's API Classification: A Machine Learning Empirical Evaluation</title><title>IEEE transactions on computers</title><addtitle>TC</addtitle><description>The Android operating system provides functions and methods to handle sensitive data to secure users' data. The Android security literature extracts binary features from a method and classifies the method into one of the Security Relevant Method's classes, adding information about how the method handles sensitive data. However, the usage of binary features hinders the performance of some classifiers due to the high collision rate between instances. Although previous works have explored Security Relevant Method classification, an extensive study of machine learning algorithms over this problem has not been conceived. This work fills this gap, analyzing Monolithic classifiers, Multiple Classifier Systems, and Embedding algorithms to transform binary features into real-valued features, aiming to facilitate the classifier's work by minimizing the ambiguity promoted by the collision. Our analyzes show that META-DES, using a pool of Decision Trees trained with the Random Forest algorithm, statistically has the best results. We also find that, in general, distance-based classifiers have a disadvantage in binary features. Moreover, embedding techniques such as deep metric learning with triplet loss can reduce geometrical instance ambiguity, improving the performance of the weakest learning algorithms. However, its usage was detrimental to the performance of more robust techniques, such as dynamic ensemble models better suited for handling difficult cases. The dataset and code used for the experiments are available in the following repository: https://github.com/walbermr/android-srm-ml-evaluation .</description><subject>Algorithms</subject><subject>Ambiguity</subject><subject>Android security</subject><subject>API Classification</subject><subject>Binary features</subject><subject>Classification</subject><subject>Classification algorithms</subject><subject>Classifiers</subject><subject>Codes</subject><subject>Collision rates</subject><subject>Decision analysis</subject><subject>Decision trees</subject><subject>Embedding</subject><subject>Empirical analysis</subject><subject>Feature extraction</subject><subject>Handles</subject><subject>Machine learning</subject><subject>Machine learning algorithms</subject><subject>Multiple Classifier Systems</subject><subject>Pipelines</subject><subject>Security</subject><issn>0018-9340</issn><issn>1557-9956</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkD1PwzAARC0EEqUwszBYYmBK68_EZouiApVagaDMlu041FWaFDup1H9PShmYbnl3Jz0AbjGaYIzkdFVMCCJ0QonEUoozMMKcZ4mUPD0HI4SwSCRl6BJcxbhBCKUEyREwH872wXcH-O5qt9dNB5euW7dlhG0F86YMrS8fIszf5rCodYy-8lZ3vm0eYQ6X2q594-DC6dD45gvOtjsfBqCGs72u-1_wGlxUuo7u5i_H4PNptipeksXr87zIF4kljHWJkAZrw1lqrNGIOOEqZjWpOGWpNdymjGFkhOSuJI4LSdJMGp3yMsNZJhmmY3B_2t2F9rt3sVObtg_NcKmIyAYXgnIyUNMTZUMbY3CV2gW_1eGgMFJHkWpVqKNI9SdyaNydGt4594_Gggoi6Q_2dW4q</recordid><startdate>20231101</startdate><enddate>20231101</enddate><creator>Rodrigues, Walber M.</creator><creator>Walmsley, Felipe N.</creator><creator>Cavalcanti, George D. C.</creator><creator>Cruz, Rafael M. O.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-9446-1040</orcidid><orcidid>https://orcid.org/0000-0003-1410-1059</orcidid><orcidid>https://orcid.org/0000-0001-7714-2283</orcidid><orcidid>https://orcid.org/0000-0002-8809-6304</orcidid></search><sort><creationdate>20231101</creationdate><title>Security Relevant Methods of Android's API Classification: A Machine Learning Empirical Evaluation</title><author>Rodrigues, Walber M. ; Walmsley, Felipe N. ; Cavalcanti, George D. C. ; Cruz, Rafael M. O.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c244t-89b1ab546bcba02e8ef4ca2f5346cb5c64410b895ed2e5892679ba65d71779413</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Ambiguity</topic><topic>Android security</topic><topic>API Classification</topic><topic>Binary features</topic><topic>Classification</topic><topic>Classification algorithms</topic><topic>Classifiers</topic><topic>Codes</topic><topic>Collision rates</topic><topic>Decision analysis</topic><topic>Decision trees</topic><topic>Embedding</topic><topic>Empirical analysis</topic><topic>Feature extraction</topic><topic>Handles</topic><topic>Machine learning</topic><topic>Machine learning algorithms</topic><topic>Multiple Classifier Systems</topic><topic>Pipelines</topic><topic>Security</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rodrigues, Walber M.</creatorcontrib><creatorcontrib>Walmsley, Felipe N.</creatorcontrib><creatorcontrib>Cavalcanti, George D. C.</creatorcontrib><creatorcontrib>Cruz, Rafael M. O.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on computers</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Rodrigues, Walber M.</au><au>Walmsley, Felipe N.</au><au>Cavalcanti, George D. C.</au><au>Cruz, Rafael M. O.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Security Relevant Methods of Android's API Classification: A Machine Learning Empirical Evaluation</atitle><jtitle>IEEE transactions on computers</jtitle><stitle>TC</stitle><date>2023-11-01</date><risdate>2023</risdate><volume>72</volume><issue>11</issue><spage>1</spage><epage>13</epage><pages>1-13</pages><issn>0018-9340</issn><eissn>1557-9956</eissn><coden>ITCOB4</coden><abstract>The Android operating system provides functions and methods to handle sensitive data to secure users' data. The Android security literature extracts binary features from a method and classifies the method into one of the Security Relevant Method's classes, adding information about how the method handles sensitive data. However, the usage of binary features hinders the performance of some classifiers due to the high collision rate between instances. Although previous works have explored Security Relevant Method classification, an extensive study of machine learning algorithms over this problem has not been conceived. This work fills this gap, analyzing Monolithic classifiers, Multiple Classifier Systems, and Embedding algorithms to transform binary features into real-valued features, aiming to facilitate the classifier's work by minimizing the ambiguity promoted by the collision. Our analyzes show that META-DES, using a pool of Decision Trees trained with the Random Forest algorithm, statistically has the best results. We also find that, in general, distance-based classifiers have a disadvantage in binary features. Moreover, embedding techniques such as deep metric learning with triplet loss can reduce geometrical instance ambiguity, improving the performance of the weakest learning algorithms. However, its usage was detrimental to the performance of more robust techniques, such as dynamic ensemble models better suited for handling difficult cases. The dataset and code used for the experiments are available in the following repository: https://github.com/walbermr/android-srm-ml-evaluation .</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TC.2023.3291998</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0001-9446-1040</orcidid><orcidid>https://orcid.org/0000-0003-1410-1059</orcidid><orcidid>https://orcid.org/0000-0001-7714-2283</orcidid><orcidid>https://orcid.org/0000-0002-8809-6304</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 0018-9340 |
ispartof | IEEE transactions on computers, 2023-11, Vol.72 (11), p.1-13 |
issn | 0018-9340 1557-9956 |
language | eng |
recordid | cdi_proquest_journals_2875578352 |
source | IEEE Electronic Library (IEL) |
subjects | Algorithms Ambiguity Android security API Classification Binary features Classification Classification algorithms Classifiers Codes Collision rates Decision analysis Decision trees Embedding Empirical analysis Feature extraction Handles Machine learning Machine learning algorithms Multiple Classifier Systems Pipelines Security |
title | Security Relevant Methods of Android's API Classification: A Machine Learning Empirical Evaluation |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T18%3A34%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Security%20Relevant%20Methods%20of%20Android's%20API%20Classification:%20A%20Machine%20Learning%20Empirical%20Evaluation&rft.jtitle=IEEE%20transactions%20on%20computers&rft.au=Rodrigues,%20Walber%20M.&rft.date=2023-11-01&rft.volume=72&rft.issue=11&rft.spage=1&rft.epage=13&rft.pages=1-13&rft.issn=0018-9340&rft.eissn=1557-9956&rft.coden=ITCOB4&rft_id=info:doi/10.1109/TC.2023.3291998&rft_dat=%3Cproquest_RIE%3E2875578352%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2875578352&rft_id=info:pmid/&rft_ieee_id=10183829&rfr_iscdi=true |