Differential area analysis for ransomware attack detection within mixed file datasets

The threat from ransomware continues to grow both in the number of affected victims as well as the cost incurred by the people and organisations impacted in a successful attack. In the majority of cases, once a victim has been attacked there remain only two courses of action open to them; either pay...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computers & security 2021-09, Vol.108, p.102377, Article 102377
Hauptverfasser:	Davies, Simon R., Macfarlane, Richard, Buchanan, William J.
Format:	Artikel
Sprache:	eng
Schlagworte:	Archives & records Data encryption Datasets Entropy Entropy (Information theory) Model accuracy Phobos Random numbers Ransomware Ransomware detection Test data sets
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page	102377
container_title	Computers & security
container_volume	108
creator	Davies, Simon R. Macfarlane, Richard Buchanan, William J.
description	The threat from ransomware continues to grow both in the number of affected victims as well as the cost incurred by the people and organisations impacted in a successful attack. In the majority of cases, once a victim has been attacked there remain only two courses of action open to them; either pay the ransom or lose their data. One common behaviour shared between all crypto ransomware strains is that at some point during their execution they will attempt to encrypt the users’ files. This paper demonstrates a technique that can identify when these encrypted files are being generated and is independent of the strain of the ransomware. An enhanced mixed file ransomware data set of more than 130,000 files was developed based on the govdocs1(Garfinkel, 2020) corpus. This data set was enriched to contain examples of files that reflect the more modern Microsoft file formats, as well as examples of high entropy file formats such as compressed files and archives. The data set also contained eight different sets of files that were generated as the result of different real-world high profile ransomware attacks such as WannaCry, Ryuk, Phobos, Sodinokibi and NetWalker. Previous research Penrose et al. (2013); Zhao et al. (2011) has highlighted the difficulty in differentiating between compressed and encrypted files using Shannon entropy as both file types exhibit similar values. One of the experiments described in this paper shows a unique characteristic for the Shannon entropy of encrypted file header fragments. This characteristic was used to differentiate between encrypted files and other high entropy files such as archives. This discovery was leveraged in the development of a file classification model that used the differential area between the entropy curve of a file under analysis and one generated from random data. When comparing the entropy plot values of a file under analysis against one generated by a file containing purely random numbers, the greater the correlation of the plots is, the higher the confidence that the file under analysis contains encrypted data. The experiments demonstrate a high degree of confidence in the accuracy of the model achieving a success rate of more than 99.96% when examining only the first 192 bytes of a file, using a mixed data set of more than 80,000 files. This technique successfully addresses the problem of using file entropy to differentiate compressed and archived files from files encrypted by ransomware in a timely manne
doi_str_mv	10.1016/j.cose.2021.102377
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2561518248</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167404821002017</els_id><sourcerecordid>2561518248</sourcerecordid><originalsourceid>FETCH-LOGICAL-c372t-bc8d2fb0591e2b2ca7d4905a6b8271f21a63a4ecbcf7c121713019fe4a274a4f3</originalsourceid><addsrcrecordid>eNp9kE9LAzEQxYMoWKtfwFPA89Ykm92k4EXqXyh4secwm51g1u2mJqm1394t9expYOa94b0fIdeczTjj9W03syHhTDDBx4UolTohE66VKGrB9CmZjCJVSCb1OblIqWOMq1rrCVk9eOcw4pA99BQiAoUB-n3yiboQaYQhhfVuPFDIGewnbTGjzT4MdOfzhx_o2v9gS53vkbaQIWFOl-TMQZ_w6m9Oyerp8X3xUizfnl8X98vClkrkorG6Fa5h1ZyjaIQF1co5q6ButFDcCQ51CRJtY52yXHDFS8bnDiUIJUG6ckpujn83MXxtMWXThW0c8ycjqppXXAupR5U4qmwMKUV0ZhP9GuLecGYO-ExnDvjMAZ854htNd0cTjvm_PUaTrMfBYuvj2N-0wf9n_wWCF3mV</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2561518248</pqid></control><display><type>article</type><title>Differential area analysis for ransomware attack detection within mixed file datasets</title><source>Elsevier ScienceDirect Journals Complete</source><creator>Davies, Simon R. ; Macfarlane, Richard ; Buchanan, William J.</creator><creatorcontrib>Davies, Simon R. ; Macfarlane, Richard ; Buchanan, William J.</creatorcontrib><description>The threat from ransomware continues to grow both in the number of affected victims as well as the cost incurred by the people and organisations impacted in a successful attack. In the majority of cases, once a victim has been attacked there remain only two courses of action open to them; either pay the ransom or lose their data. One common behaviour shared between all crypto ransomware strains is that at some point during their execution they will attempt to encrypt the users’ files. This paper demonstrates a technique that can identify when these encrypted files are being generated and is independent of the strain of the ransomware. An enhanced mixed file ransomware data set of more than 130,000 files was developed based on the govdocs1(Garfinkel, 2020) corpus. This data set was enriched to contain examples of files that reflect the more modern Microsoft file formats, as well as examples of high entropy file formats such as compressed files and archives. The data set also contained eight different sets of files that were generated as the result of different real-world high profile ransomware attacks such as WannaCry, Ryuk, Phobos, Sodinokibi and NetWalker. Previous research Penrose et al. (2013); Zhao et al. (2011) has highlighted the difficulty in differentiating between compressed and encrypted files using Shannon entropy as both file types exhibit similar values. One of the experiments described in this paper shows a unique characteristic for the Shannon entropy of encrypted file header fragments. This characteristic was used to differentiate between encrypted files and other high entropy files such as archives. This discovery was leveraged in the development of a file classification model that used the differential area between the entropy curve of a file under analysis and one generated from random data. When comparing the entropy plot values of a file under analysis against one generated by a file containing purely random numbers, the greater the correlation of the plots is, the higher the confidence that the file under analysis contains encrypted data. The experiments demonstrate a high degree of confidence in the accuracy of the model achieving a success rate of more than 99.96% when examining only the first 192 bytes of a file, using a mixed data set of more than 80,000 files. This technique successfully addresses the problem of using file entropy to differentiate compressed and archived files from files encrypted by ransomware in a timely manner.</description><identifier>ISSN: 0167-4048</identifier><identifier>EISSN: 1872-6208</identifier><identifier>DOI: 10.1016/j.cose.2021.102377</identifier><language>eng</language><publisher>Amsterdam: Elsevier Ltd</publisher><subject>Archives & records ; Data encryption ; Datasets ; Entropy ; Entropy (Information theory) ; Model accuracy ; Phobos ; Random numbers ; Ransomware ; Ransomware detection ; Test data sets</subject><ispartof>Computers & security, 2021-09, Vol.108, p.102377, Article 102377</ispartof><rights>2021 Elsevier Ltd</rights><rights>Copyright Elsevier Sequoia S.A. Sep 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c372t-bc8d2fb0591e2b2ca7d4905a6b8271f21a63a4ecbcf7c121713019fe4a274a4f3</citedby><cites>FETCH-LOGICAL-c372t-bc8d2fb0591e2b2ca7d4905a6b8271f21a63a4ecbcf7c121713019fe4a274a4f3</cites><orcidid>0000-0001-9377-4539</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.cose.2021.102377$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Davies, Simon R.</creatorcontrib><creatorcontrib>Macfarlane, Richard</creatorcontrib><creatorcontrib>Buchanan, William J.</creatorcontrib><title>Differential area analysis for ransomware attack detection within mixed file datasets</title><title>Computers & security</title><description>The threat from ransomware continues to grow both in the number of affected victims as well as the cost incurred by the people and organisations impacted in a successful attack. In the majority of cases, once a victim has been attacked there remain only two courses of action open to them; either pay the ransom or lose their data. One common behaviour shared between all crypto ransomware strains is that at some point during their execution they will attempt to encrypt the users’ files. This paper demonstrates a technique that can identify when these encrypted files are being generated and is independent of the strain of the ransomware. An enhanced mixed file ransomware data set of more than 130,000 files was developed based on the govdocs1(Garfinkel, 2020) corpus. This data set was enriched to contain examples of files that reflect the more modern Microsoft file formats, as well as examples of high entropy file formats such as compressed files and archives. The data set also contained eight different sets of files that were generated as the result of different real-world high profile ransomware attacks such as WannaCry, Ryuk, Phobos, Sodinokibi and NetWalker. Previous research Penrose et al. (2013); Zhao et al. (2011) has highlighted the difficulty in differentiating between compressed and encrypted files using Shannon entropy as both file types exhibit similar values. One of the experiments described in this paper shows a unique characteristic for the Shannon entropy of encrypted file header fragments. This characteristic was used to differentiate between encrypted files and other high entropy files such as archives. This discovery was leveraged in the development of a file classification model that used the differential area between the entropy curve of a file under analysis and one generated from random data. When comparing the entropy plot values of a file under analysis against one generated by a file containing purely random numbers, the greater the correlation of the plots is, the higher the confidence that the file under analysis contains encrypted data. The experiments demonstrate a high degree of confidence in the accuracy of the model achieving a success rate of more than 99.96% when examining only the first 192 bytes of a file, using a mixed data set of more than 80,000 files. This technique successfully addresses the problem of using file entropy to differentiate compressed and archived files from files encrypted by ransomware in a timely manner.</description><subject>Archives & records</subject><subject>Data encryption</subject><subject>Datasets</subject><subject>Entropy</subject><subject>Entropy (Information theory)</subject><subject>Model accuracy</subject><subject>Phobos</subject><subject>Random numbers</subject><subject>Ransomware</subject><subject>Ransomware detection</subject><subject>Test data sets</subject><issn>0167-4048</issn><issn>1872-6208</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kE9LAzEQxYMoWKtfwFPA89Ykm92k4EXqXyh4secwm51g1u2mJqm1394t9expYOa94b0fIdeczTjj9W03syHhTDDBx4UolTohE66VKGrB9CmZjCJVSCb1OblIqWOMq1rrCVk9eOcw4pA99BQiAoUB-n3yiboQaYQhhfVuPFDIGewnbTGjzT4MdOfzhx_o2v9gS53vkbaQIWFOl-TMQZ_w6m9Oyerp8X3xUizfnl8X98vClkrkorG6Fa5h1ZyjaIQF1co5q6ButFDcCQ51CRJtY52yXHDFS8bnDiUIJUG6ckpujn83MXxtMWXThW0c8ycjqppXXAupR5U4qmwMKUV0ZhP9GuLecGYO-ExnDvjMAZ854htNd0cTjvm_PUaTrMfBYuvj2N-0wf9n_wWCF3mV</recordid><startdate>202109</startdate><enddate>202109</enddate><creator>Davies, Simon R.</creator><creator>Macfarlane, Richard</creator><creator>Buchanan, William J.</creator><general>Elsevier Ltd</general><general>Elsevier Sequoia S.A</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>K7.</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-9377-4539</orcidid></search><sort><creationdate>202109</creationdate><title>Differential area analysis for ransomware attack detection within mixed file datasets</title><author>Davies, Simon R. ; Macfarlane, Richard ; Buchanan, William J.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c372t-bc8d2fb0591e2b2ca7d4905a6b8271f21a63a4ecbcf7c121713019fe4a274a4f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Archives & records</topic><topic>Data encryption</topic><topic>Datasets</topic><topic>Entropy</topic><topic>Entropy (Information theory)</topic><topic>Model accuracy</topic><topic>Phobos</topic><topic>Random numbers</topic><topic>Ransomware</topic><topic>Ransomware detection</topic><topic>Test data sets</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Davies, Simon R.</creatorcontrib><creatorcontrib>Macfarlane, Richard</creatorcontrib><creatorcontrib>Buchanan, William J.</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Criminal Justice (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Computers & security</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Davies, Simon R.</au><au>Macfarlane, Richard</au><au>Buchanan, William J.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Differential area analysis for ransomware attack detection within mixed file datasets</atitle><jtitle>Computers & security</jtitle><date>2021-09</date><risdate>2021</risdate><volume>108</volume><spage>102377</spage><pages>102377-</pages><artnum>102377</artnum><issn>0167-4048</issn><eissn>1872-6208</eissn><abstract>The threat from ransomware continues to grow both in the number of affected victims as well as the cost incurred by the people and organisations impacted in a successful attack. In the majority of cases, once a victim has been attacked there remain only two courses of action open to them; either pay the ransom or lose their data. One common behaviour shared between all crypto ransomware strains is that at some point during their execution they will attempt to encrypt the users’ files. This paper demonstrates a technique that can identify when these encrypted files are being generated and is independent of the strain of the ransomware. An enhanced mixed file ransomware data set of more than 130,000 files was developed based on the govdocs1(Garfinkel, 2020) corpus. This data set was enriched to contain examples of files that reflect the more modern Microsoft file formats, as well as examples of high entropy file formats such as compressed files and archives. The data set also contained eight different sets of files that were generated as the result of different real-world high profile ransomware attacks such as WannaCry, Ryuk, Phobos, Sodinokibi and NetWalker. Previous research Penrose et al. (2013); Zhao et al. (2011) has highlighted the difficulty in differentiating between compressed and encrypted files using Shannon entropy as both file types exhibit similar values. One of the experiments described in this paper shows a unique characteristic for the Shannon entropy of encrypted file header fragments. This characteristic was used to differentiate between encrypted files and other high entropy files such as archives. This discovery was leveraged in the development of a file classification model that used the differential area between the entropy curve of a file under analysis and one generated from random data. When comparing the entropy plot values of a file under analysis against one generated by a file containing purely random numbers, the greater the correlation of the plots is, the higher the confidence that the file under analysis contains encrypted data. The experiments demonstrate a high degree of confidence in the accuracy of the model achieving a success rate of more than 99.96% when examining only the first 192 bytes of a file, using a mixed data set of more than 80,000 files. This technique successfully addresses the problem of using file entropy to differentiate compressed and archived files from files encrypted by ransomware in a timely manner.</abstract><cop>Amsterdam</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.cose.2021.102377</doi><orcidid>https://orcid.org/0000-0001-9377-4539</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0167-4048
ispartof	Computers & security, 2021-09, Vol.108, p.102377, Article 102377
issn	0167-4048 1872-6208
language	eng
recordid	cdi_proquest_journals_2561518248
source	Elsevier ScienceDirect Journals Complete
subjects	Archives & records Data encryption Datasets Entropy Entropy (Information theory) Model accuracy Phobos Random numbers Ransomware Ransomware detection Test data sets
title	Differential area analysis for ransomware attack detection within mixed file datasets
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T00%3A09%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Differential%20area%20analysis%20for%20ransomware%20attack%20detection%20within%20mixed%20file%20datasets&rft.jtitle=Computers%20&%20security&rft.au=Davies,%20Simon%20R.&rft.date=2021-09&rft.volume=108&rft.spage=102377&rft.pages=102377-&rft.artnum=102377&rft.issn=0167-4048&rft.eissn=1872-6208&rft_id=info:doi/10.1016/j.cose.2021.102377&rft_dat=%3Cproquest_cross%3E2561518248%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2561518248&rft_id=info:pmid/&rft_els_id=S0167404821002017&rfr_iscdi=true