An empirical study to estimate the stability of random forest classifier on the hybrid features recommended by filter based feature selection technique
The emergence of advanced malware is a serious threat to information security. A prominent technique that identifies sophisticated malware should consider the runtime behaviour of the source file to detect malicious intent. Although the behaviour-based malware detection technique is a substantial im...
Gespeichert in:
Veröffentlicht in: | International journal of machine learning and cybernetics 2020-02, Vol.11 (2), p.339-358 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 358 |
---|---|
container_issue | 2 |
container_start_page | 339 |
container_title | International journal of machine learning and cybernetics |
container_volume | 11 |
creator | Darshan, S. L. Shiva Jaidhar, C. D. |
description | The emergence of advanced malware is a serious threat to information security. A prominent technique that identifies sophisticated malware should consider the runtime behaviour of the source file to detect malicious intent. Although the behaviour-based malware detection technique is a substantial improvement over the traditional signature-based detection technique, current malware employs code obfuscation techniques to elude detection. This paper presents the Hybrid Features-based malware detection system (HFMDS) that integrates static and dynamic features of the portable executable (PE) files to discern malware. The HFMDS is trained with prominent features advised by the filter-based feature selection technique (FST). The detection ability of the proposed HFMDS has evaluated with the random forest (RF) classifier by considering two different datasets that consist of real-world Windows malware samples. In-depth analysis is carried out to determine the optimal number of decision trees (DTs) required by the RF classifier to achieve consistent accuracy. Besides, four popular FSTs performance is also analyzed to determine which FST recommends the best features. From the experimental analysis, we can infer that increasing the number of DTs after 160 within the RF classifier does not make a significant difference in attaining better detection accuracy. |
doi_str_mv | 10.1007/s13042-019-00978-7 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2920625284</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2920625284</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-61b0ed5aff07997361a30377bd7ff49a27f0738022d664469393181f8bba1e83</originalsourceid><addsrcrecordid>eNp9kc1OxCAUhRujiROdF3BF4rp6gVpgOZn4l0ziZhbuCG0vDqYtIzCLPomvK1oz7mQDuXznnOSeoriicEMBxG2kHCpWAlUlgBKyFCfFgspalhLk6-nxLeh5sYzxHfKpgXNgi-JzNRIc9i641vQkpkM3keQJxuQGk5CkHeapaVzv0kS8JcGMnR-I9SEzpO1NjM46DMSPP_BuaoLriEWTDhkhAVs_DDh22JFmItb1KcONiXiESMQe2-S-HbDdje7jgJfFmTV9xOXvfVFsH-6366dy8_L4vF5typZTlcqaNoDdnbEWhFKC19Rw4EI0nbC2UoaJ_MElMNbVdVXViitOJbWyaQxFyS-K69l2H3xOjUm_-0MYc6JmikHN7pisMsVmqg0-xoBW70NeT5g0Bf1dgZ4r0LkC_VOBFlnEZ1HM8PiG4c_6H9UXi7aL8A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2920625284</pqid></control><display><type>article</type><title>An empirical study to estimate the stability of random forest classifier on the hybrid features recommended by filter based feature selection technique</title><source>SpringerNature Journals</source><source>ProQuest Central UK/Ireland</source><source>ProQuest Central</source><creator>Darshan, S. L. Shiva ; Jaidhar, C. D.</creator><creatorcontrib>Darshan, S. L. Shiva ; Jaidhar, C. D.</creatorcontrib><description>The emergence of advanced malware is a serious threat to information security. A prominent technique that identifies sophisticated malware should consider the runtime behaviour of the source file to detect malicious intent. Although the behaviour-based malware detection technique is a substantial improvement over the traditional signature-based detection technique, current malware employs code obfuscation techniques to elude detection. This paper presents the Hybrid Features-based malware detection system (HFMDS) that integrates static and dynamic features of the portable executable (PE) files to discern malware. The HFMDS is trained with prominent features advised by the filter-based feature selection technique (FST). The detection ability of the proposed HFMDS has evaluated with the random forest (RF) classifier by considering two different datasets that consist of real-world Windows malware samples. In-depth analysis is carried out to determine the optimal number of decision trees (DTs) required by the RF classifier to achieve consistent accuracy. Besides, four popular FSTs performance is also analyzed to determine which FST recommends the best features. From the experimental analysis, we can infer that increasing the number of DTs after 160 within the RF classifier does not make a significant difference in attaining better detection accuracy.</description><identifier>ISSN: 1868-8071</identifier><identifier>EISSN: 1868-808X</identifier><identifier>DOI: 10.1007/s13042-019-00978-7</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Accuracy ; Application programming interface ; Artificial Intelligence ; Classifiers ; Complex Systems ; Computational Intelligence ; Control ; Cybersecurity ; Datasets ; Decision analysis ; Decision trees ; Disk operating systems ; Dynamic link libraries ; Empirical analysis ; Engineering ; Hybrid systems ; Machine learning ; Malware ; Mechatronics ; Original Article ; Pattern Recognition ; Robotics ; Systems Biology</subject><ispartof>International journal of machine learning and cybernetics, 2020-02, Vol.11 (2), p.339-358</ispartof><rights>Springer-Verlag GmbH Germany, part of Springer Nature 2019</rights><rights>Springer-Verlag GmbH Germany, part of Springer Nature 2019.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-61b0ed5aff07997361a30377bd7ff49a27f0738022d664469393181f8bba1e83</citedby><cites>FETCH-LOGICAL-c319t-61b0ed5aff07997361a30377bd7ff49a27f0738022d664469393181f8bba1e83</cites><orcidid>0000-0001-9556-1342</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s13042-019-00978-7$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2920625284?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,21388,27924,27925,33744,41488,42557,43805,51319,64385,64389,72469</link.rule.ids></links><search><creatorcontrib>Darshan, S. L. Shiva</creatorcontrib><creatorcontrib>Jaidhar, C. D.</creatorcontrib><title>An empirical study to estimate the stability of random forest classifier on the hybrid features recommended by filter based feature selection technique</title><title>International journal of machine learning and cybernetics</title><addtitle>Int. J. Mach. Learn. & Cyber</addtitle><description>The emergence of advanced malware is a serious threat to information security. A prominent technique that identifies sophisticated malware should consider the runtime behaviour of the source file to detect malicious intent. Although the behaviour-based malware detection technique is a substantial improvement over the traditional signature-based detection technique, current malware employs code obfuscation techniques to elude detection. This paper presents the Hybrid Features-based malware detection system (HFMDS) that integrates static and dynamic features of the portable executable (PE) files to discern malware. The HFMDS is trained with prominent features advised by the filter-based feature selection technique (FST). The detection ability of the proposed HFMDS has evaluated with the random forest (RF) classifier by considering two different datasets that consist of real-world Windows malware samples. In-depth analysis is carried out to determine the optimal number of decision trees (DTs) required by the RF classifier to achieve consistent accuracy. Besides, four popular FSTs performance is also analyzed to determine which FST recommends the best features. From the experimental analysis, we can infer that increasing the number of DTs after 160 within the RF classifier does not make a significant difference in attaining better detection accuracy.</description><subject>Accuracy</subject><subject>Application programming interface</subject><subject>Artificial Intelligence</subject><subject>Classifiers</subject><subject>Complex Systems</subject><subject>Computational Intelligence</subject><subject>Control</subject><subject>Cybersecurity</subject><subject>Datasets</subject><subject>Decision analysis</subject><subject>Decision trees</subject><subject>Disk operating systems</subject><subject>Dynamic link libraries</subject><subject>Empirical analysis</subject><subject>Engineering</subject><subject>Hybrid systems</subject><subject>Machine learning</subject><subject>Malware</subject><subject>Mechatronics</subject><subject>Original Article</subject><subject>Pattern Recognition</subject><subject>Robotics</subject><subject>Systems Biology</subject><issn>1868-8071</issn><issn>1868-808X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kc1OxCAUhRujiROdF3BF4rp6gVpgOZn4l0ziZhbuCG0vDqYtIzCLPomvK1oz7mQDuXznnOSeoriicEMBxG2kHCpWAlUlgBKyFCfFgspalhLk6-nxLeh5sYzxHfKpgXNgi-JzNRIc9i641vQkpkM3keQJxuQGk5CkHeapaVzv0kS8JcGMnR-I9SEzpO1NjM46DMSPP_BuaoLriEWTDhkhAVs_DDh22JFmItb1KcONiXiESMQe2-S-HbDdje7jgJfFmTV9xOXvfVFsH-6366dy8_L4vF5typZTlcqaNoDdnbEWhFKC19Rw4EI0nbC2UoaJ_MElMNbVdVXViitOJbWyaQxFyS-K69l2H3xOjUm_-0MYc6JmikHN7pisMsVmqg0-xoBW70NeT5g0Bf1dgZ4r0LkC_VOBFlnEZ1HM8PiG4c_6H9UXi7aL8A</recordid><startdate>20200201</startdate><enddate>20200201</enddate><creator>Darshan, S. L. Shiva</creator><creator>Jaidhar, C. D.</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L6V</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><orcidid>https://orcid.org/0000-0001-9556-1342</orcidid></search><sort><creationdate>20200201</creationdate><title>An empirical study to estimate the stability of random forest classifier on the hybrid features recommended by filter based feature selection technique</title><author>Darshan, S. L. Shiva ; Jaidhar, C. D.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-61b0ed5aff07997361a30377bd7ff49a27f0738022d664469393181f8bba1e83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Accuracy</topic><topic>Application programming interface</topic><topic>Artificial Intelligence</topic><topic>Classifiers</topic><topic>Complex Systems</topic><topic>Computational Intelligence</topic><topic>Control</topic><topic>Cybersecurity</topic><topic>Datasets</topic><topic>Decision analysis</topic><topic>Decision trees</topic><topic>Disk operating systems</topic><topic>Dynamic link libraries</topic><topic>Empirical analysis</topic><topic>Engineering</topic><topic>Hybrid systems</topic><topic>Machine learning</topic><topic>Malware</topic><topic>Mechatronics</topic><topic>Original Article</topic><topic>Pattern Recognition</topic><topic>Robotics</topic><topic>Systems Biology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Darshan, S. L. Shiva</creatorcontrib><creatorcontrib>Jaidhar, C. D.</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection><jtitle>International journal of machine learning and cybernetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Darshan, S. L. Shiva</au><au>Jaidhar, C. D.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An empirical study to estimate the stability of random forest classifier on the hybrid features recommended by filter based feature selection technique</atitle><jtitle>International journal of machine learning and cybernetics</jtitle><stitle>Int. J. Mach. Learn. & Cyber</stitle><date>2020-02-01</date><risdate>2020</risdate><volume>11</volume><issue>2</issue><spage>339</spage><epage>358</epage><pages>339-358</pages><issn>1868-8071</issn><eissn>1868-808X</eissn><abstract>The emergence of advanced malware is a serious threat to information security. A prominent technique that identifies sophisticated malware should consider the runtime behaviour of the source file to detect malicious intent. Although the behaviour-based malware detection technique is a substantial improvement over the traditional signature-based detection technique, current malware employs code obfuscation techniques to elude detection. This paper presents the Hybrid Features-based malware detection system (HFMDS) that integrates static and dynamic features of the portable executable (PE) files to discern malware. The HFMDS is trained with prominent features advised by the filter-based feature selection technique (FST). The detection ability of the proposed HFMDS has evaluated with the random forest (RF) classifier by considering two different datasets that consist of real-world Windows malware samples. In-depth analysis is carried out to determine the optimal number of decision trees (DTs) required by the RF classifier to achieve consistent accuracy. Besides, four popular FSTs performance is also analyzed to determine which FST recommends the best features. From the experimental analysis, we can infer that increasing the number of DTs after 160 within the RF classifier does not make a significant difference in attaining better detection accuracy.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s13042-019-00978-7</doi><tpages>20</tpages><orcidid>https://orcid.org/0000-0001-9556-1342</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1868-8071 |
ispartof | International journal of machine learning and cybernetics, 2020-02, Vol.11 (2), p.339-358 |
issn | 1868-8071 1868-808X |
language | eng |
recordid | cdi_proquest_journals_2920625284 |
source | SpringerNature Journals; ProQuest Central UK/Ireland; ProQuest Central |
subjects | Accuracy Application programming interface Artificial Intelligence Classifiers Complex Systems Computational Intelligence Control Cybersecurity Datasets Decision analysis Decision trees Disk operating systems Dynamic link libraries Empirical analysis Engineering Hybrid systems Machine learning Malware Mechatronics Original Article Pattern Recognition Robotics Systems Biology |
title | An empirical study to estimate the stability of random forest classifier on the hybrid features recommended by filter based feature selection technique |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T07%3A53%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20empirical%20study%20to%20estimate%20the%20stability%20of%20random%20forest%20classifier%20on%20the%20hybrid%20features%20recommended%20by%20filter%20based%20feature%20selection%20technique&rft.jtitle=International%20journal%20of%20machine%20learning%20and%20cybernetics&rft.au=Darshan,%20S.%20L.%20Shiva&rft.date=2020-02-01&rft.volume=11&rft.issue=2&rft.spage=339&rft.epage=358&rft.pages=339-358&rft.issn=1868-8071&rft.eissn=1868-808X&rft_id=info:doi/10.1007/s13042-019-00978-7&rft_dat=%3Cproquest_cross%3E2920625284%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2920625284&rft_id=info:pmid/&rfr_iscdi=true |