An empirical study to estimate the stability of random forest classifier on the hybrid features recommended by filter based feature selection technique

The emergence of advanced malware is a serious threat to information security. A prominent technique that identifies sophisticated malware should consider the runtime behaviour of the source file to detect malicious intent. Although the behaviour-based malware detection technique is a substantial im...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of machine learning and cybernetics 2020-02, Vol.11 (2), p.339-358
Hauptverfasser: Darshan, S. L. Shiva, Jaidhar, C. D.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 358
container_issue 2
container_start_page 339
container_title International journal of machine learning and cybernetics
container_volume 11
creator Darshan, S. L. Shiva
Jaidhar, C. D.
description The emergence of advanced malware is a serious threat to information security. A prominent technique that identifies sophisticated malware should consider the runtime behaviour of the source file to detect malicious intent. Although the behaviour-based malware detection technique is a substantial improvement over the traditional signature-based detection technique, current malware employs code obfuscation techniques to elude detection. This paper presents the Hybrid Features-based malware detection system (HFMDS) that integrates static and dynamic features of the portable executable (PE) files to discern malware. The HFMDS is trained with prominent features advised by the filter-based feature selection technique (FST). The detection ability of the proposed HFMDS has evaluated with the random forest (RF) classifier by considering two different datasets that consist of real-world Windows malware samples. In-depth analysis is carried out to determine the optimal number of decision trees (DTs) required by the RF classifier to achieve consistent accuracy. Besides, four popular FSTs performance is also analyzed to determine which FST recommends the best features. From the experimental analysis, we can infer that increasing the number of DTs after 160 within the RF classifier does not make a significant difference in attaining better detection accuracy.
doi_str_mv 10.1007/s13042-019-00978-7
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2920625284</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2920625284</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-61b0ed5aff07997361a30377bd7ff49a27f0738022d664469393181f8bba1e83</originalsourceid><addsrcrecordid>eNp9kc1OxCAUhRujiROdF3BF4rp6gVpgOZn4l0ziZhbuCG0vDqYtIzCLPomvK1oz7mQDuXznnOSeoriicEMBxG2kHCpWAlUlgBKyFCfFgspalhLk6-nxLeh5sYzxHfKpgXNgi-JzNRIc9i641vQkpkM3keQJxuQGk5CkHeapaVzv0kS8JcGMnR-I9SEzpO1NjM46DMSPP_BuaoLriEWTDhkhAVs_DDh22JFmItb1KcONiXiESMQe2-S-HbDdje7jgJfFmTV9xOXvfVFsH-6366dy8_L4vF5typZTlcqaNoDdnbEWhFKC19Rw4EI0nbC2UoaJ_MElMNbVdVXViitOJbWyaQxFyS-K69l2H3xOjUm_-0MYc6JmikHN7pisMsVmqg0-xoBW70NeT5g0Bf1dgZ4r0LkC_VOBFlnEZ1HM8PiG4c_6H9UXi7aL8A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2920625284</pqid></control><display><type>article</type><title>An empirical study to estimate the stability of random forest classifier on the hybrid features recommended by filter based feature selection technique</title><source>SpringerNature Journals</source><source>ProQuest Central UK/Ireland</source><source>ProQuest Central</source><creator>Darshan, S. L. Shiva ; Jaidhar, C. D.</creator><creatorcontrib>Darshan, S. L. Shiva ; Jaidhar, C. D.</creatorcontrib><description>The emergence of advanced malware is a serious threat to information security. A prominent technique that identifies sophisticated malware should consider the runtime behaviour of the source file to detect malicious intent. Although the behaviour-based malware detection technique is a substantial improvement over the traditional signature-based detection technique, current malware employs code obfuscation techniques to elude detection. This paper presents the Hybrid Features-based malware detection system (HFMDS) that integrates static and dynamic features of the portable executable (PE) files to discern malware. The HFMDS is trained with prominent features advised by the filter-based feature selection technique (FST). The detection ability of the proposed HFMDS has evaluated with the random forest (RF) classifier by considering two different datasets that consist of real-world Windows malware samples. In-depth analysis is carried out to determine the optimal number of decision trees (DTs) required by the RF classifier to achieve consistent accuracy. Besides, four popular FSTs performance is also analyzed to determine which FST recommends the best features. From the experimental analysis, we can infer that increasing the number of DTs after 160 within the RF classifier does not make a significant difference in attaining better detection accuracy.</description><identifier>ISSN: 1868-8071</identifier><identifier>EISSN: 1868-808X</identifier><identifier>DOI: 10.1007/s13042-019-00978-7</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Accuracy ; Application programming interface ; Artificial Intelligence ; Classifiers ; Complex Systems ; Computational Intelligence ; Control ; Cybersecurity ; Datasets ; Decision analysis ; Decision trees ; Disk operating systems ; Dynamic link libraries ; Empirical analysis ; Engineering ; Hybrid systems ; Machine learning ; Malware ; Mechatronics ; Original Article ; Pattern Recognition ; Robotics ; Systems Biology</subject><ispartof>International journal of machine learning and cybernetics, 2020-02, Vol.11 (2), p.339-358</ispartof><rights>Springer-Verlag GmbH Germany, part of Springer Nature 2019</rights><rights>Springer-Verlag GmbH Germany, part of Springer Nature 2019.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-61b0ed5aff07997361a30377bd7ff49a27f0738022d664469393181f8bba1e83</citedby><cites>FETCH-LOGICAL-c319t-61b0ed5aff07997361a30377bd7ff49a27f0738022d664469393181f8bba1e83</cites><orcidid>0000-0001-9556-1342</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s13042-019-00978-7$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2920625284?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,21388,27924,27925,33744,41488,42557,43805,51319,64385,64389,72469</link.rule.ids></links><search><creatorcontrib>Darshan, S. L. Shiva</creatorcontrib><creatorcontrib>Jaidhar, C. D.</creatorcontrib><title>An empirical study to estimate the stability of random forest classifier on the hybrid features recommended by filter based feature selection technique</title><title>International journal of machine learning and cybernetics</title><addtitle>Int. J. Mach. Learn. &amp; Cyber</addtitle><description>The emergence of advanced malware is a serious threat to information security. A prominent technique that identifies sophisticated malware should consider the runtime behaviour of the source file to detect malicious intent. Although the behaviour-based malware detection technique is a substantial improvement over the traditional signature-based detection technique, current malware employs code obfuscation techniques to elude detection. This paper presents the Hybrid Features-based malware detection system (HFMDS) that integrates static and dynamic features of the portable executable (PE) files to discern malware. The HFMDS is trained with prominent features advised by the filter-based feature selection technique (FST). The detection ability of the proposed HFMDS has evaluated with the random forest (RF) classifier by considering two different datasets that consist of real-world Windows malware samples. In-depth analysis is carried out to determine the optimal number of decision trees (DTs) required by the RF classifier to achieve consistent accuracy. Besides, four popular FSTs performance is also analyzed to determine which FST recommends the best features. From the experimental analysis, we can infer that increasing the number of DTs after 160 within the RF classifier does not make a significant difference in attaining better detection accuracy.</description><subject>Accuracy</subject><subject>Application programming interface</subject><subject>Artificial Intelligence</subject><subject>Classifiers</subject><subject>Complex Systems</subject><subject>Computational Intelligence</subject><subject>Control</subject><subject>Cybersecurity</subject><subject>Datasets</subject><subject>Decision analysis</subject><subject>Decision trees</subject><subject>Disk operating systems</subject><subject>Dynamic link libraries</subject><subject>Empirical analysis</subject><subject>Engineering</subject><subject>Hybrid systems</subject><subject>Machine learning</subject><subject>Malware</subject><subject>Mechatronics</subject><subject>Original Article</subject><subject>Pattern Recognition</subject><subject>Robotics</subject><subject>Systems Biology</subject><issn>1868-8071</issn><issn>1868-808X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kc1OxCAUhRujiROdF3BF4rp6gVpgOZn4l0ziZhbuCG0vDqYtIzCLPomvK1oz7mQDuXznnOSeoriicEMBxG2kHCpWAlUlgBKyFCfFgspalhLk6-nxLeh5sYzxHfKpgXNgi-JzNRIc9i641vQkpkM3keQJxuQGk5CkHeapaVzv0kS8JcGMnR-I9SEzpO1NjM46DMSPP_BuaoLriEWTDhkhAVs_DDh22JFmItb1KcONiXiESMQe2-S-HbDdje7jgJfFmTV9xOXvfVFsH-6366dy8_L4vF5typZTlcqaNoDdnbEWhFKC19Rw4EI0nbC2UoaJ_MElMNbVdVXViitOJbWyaQxFyS-K69l2H3xOjUm_-0MYc6JmikHN7pisMsVmqg0-xoBW70NeT5g0Bf1dgZ4r0LkC_VOBFlnEZ1HM8PiG4c_6H9UXi7aL8A</recordid><startdate>20200201</startdate><enddate>20200201</enddate><creator>Darshan, S. L. Shiva</creator><creator>Jaidhar, C. D.</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L6V</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><orcidid>https://orcid.org/0000-0001-9556-1342</orcidid></search><sort><creationdate>20200201</creationdate><title>An empirical study to estimate the stability of random forest classifier on the hybrid features recommended by filter based feature selection technique</title><author>Darshan, S. L. Shiva ; Jaidhar, C. D.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-61b0ed5aff07997361a30377bd7ff49a27f0738022d664469393181f8bba1e83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Accuracy</topic><topic>Application programming interface</topic><topic>Artificial Intelligence</topic><topic>Classifiers</topic><topic>Complex Systems</topic><topic>Computational Intelligence</topic><topic>Control</topic><topic>Cybersecurity</topic><topic>Datasets</topic><topic>Decision analysis</topic><topic>Decision trees</topic><topic>Disk operating systems</topic><topic>Dynamic link libraries</topic><topic>Empirical analysis</topic><topic>Engineering</topic><topic>Hybrid systems</topic><topic>Machine learning</topic><topic>Malware</topic><topic>Mechatronics</topic><topic>Original Article</topic><topic>Pattern Recognition</topic><topic>Robotics</topic><topic>Systems Biology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Darshan, S. L. Shiva</creatorcontrib><creatorcontrib>Jaidhar, C. D.</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection><jtitle>International journal of machine learning and cybernetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Darshan, S. L. Shiva</au><au>Jaidhar, C. D.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An empirical study to estimate the stability of random forest classifier on the hybrid features recommended by filter based feature selection technique</atitle><jtitle>International journal of machine learning and cybernetics</jtitle><stitle>Int. J. Mach. Learn. &amp; Cyber</stitle><date>2020-02-01</date><risdate>2020</risdate><volume>11</volume><issue>2</issue><spage>339</spage><epage>358</epage><pages>339-358</pages><issn>1868-8071</issn><eissn>1868-808X</eissn><abstract>The emergence of advanced malware is a serious threat to information security. A prominent technique that identifies sophisticated malware should consider the runtime behaviour of the source file to detect malicious intent. Although the behaviour-based malware detection technique is a substantial improvement over the traditional signature-based detection technique, current malware employs code obfuscation techniques to elude detection. This paper presents the Hybrid Features-based malware detection system (HFMDS) that integrates static and dynamic features of the portable executable (PE) files to discern malware. The HFMDS is trained with prominent features advised by the filter-based feature selection technique (FST). The detection ability of the proposed HFMDS has evaluated with the random forest (RF) classifier by considering two different datasets that consist of real-world Windows malware samples. In-depth analysis is carried out to determine the optimal number of decision trees (DTs) required by the RF classifier to achieve consistent accuracy. Besides, four popular FSTs performance is also analyzed to determine which FST recommends the best features. From the experimental analysis, we can infer that increasing the number of DTs after 160 within the RF classifier does not make a significant difference in attaining better detection accuracy.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s13042-019-00978-7</doi><tpages>20</tpages><orcidid>https://orcid.org/0000-0001-9556-1342</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1868-8071
ispartof International journal of machine learning and cybernetics, 2020-02, Vol.11 (2), p.339-358
issn 1868-8071
1868-808X
language eng
recordid cdi_proquest_journals_2920625284
source SpringerNature Journals; ProQuest Central UK/Ireland; ProQuest Central
subjects Accuracy
Application programming interface
Artificial Intelligence
Classifiers
Complex Systems
Computational Intelligence
Control
Cybersecurity
Datasets
Decision analysis
Decision trees
Disk operating systems
Dynamic link libraries
Empirical analysis
Engineering
Hybrid systems
Machine learning
Malware
Mechatronics
Original Article
Pattern Recognition
Robotics
Systems Biology
title An empirical study to estimate the stability of random forest classifier on the hybrid features recommended by filter based feature selection technique
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T07%3A53%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20empirical%20study%20to%20estimate%20the%20stability%20of%20random%20forest%20classifier%20on%20the%20hybrid%20features%20recommended%20by%20filter%20based%20feature%20selection%20technique&rft.jtitle=International%20journal%20of%20machine%20learning%20and%20cybernetics&rft.au=Darshan,%20S.%20L.%20Shiva&rft.date=2020-02-01&rft.volume=11&rft.issue=2&rft.spage=339&rft.epage=358&rft.pages=339-358&rft.issn=1868-8071&rft.eissn=1868-808X&rft_id=info:doi/10.1007/s13042-019-00978-7&rft_dat=%3Cproquest_cross%3E2920625284%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2920625284&rft_id=info:pmid/&rfr_iscdi=true