APTracker: A Comprehensive and Analytical Malware Dataset Based on Attribution to APT Groups
Malware poses a significant threat to organizations, necessitating robust countermeasures. One such measure involves attributing malware to its respective Advanced Persistent Threat (APT) group, which serves several purposes, two of the most important ones are: aiding in incident response and facili...
Gespeichert in:
Veröffentlicht in: | IEEE access 2024, Vol.12, p.145148-145158 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 145158 |
---|---|
container_issue | |
container_start_page | 145148 |
container_title | IEEE access |
container_volume | 12 |
creator | Erfan Mazaheri, Mohamad Shameli-Sendi, Alireza |
description | Malware poses a significant threat to organizations, necessitating robust countermeasures. One such measure involves attributing malware to its respective Advanced Persistent Threat (APT) group, which serves several purposes, two of the most important ones are: aiding in incident response and facilitating legal recourse. Recent years have witnessed a surge in research efforts aimed at refining methods for attributing malware to specific threat groups. These endeavors have leveraged a variety of machine learning and deep learning techniques, alongside diverse features extracted from malware binary files, to develop attribution systems. Despite these advancements, the field continues to beckon further investigation to enhance attribution methodologies. The basis of developing an effective attribution systems is to benefit from a rich dataset. Previous studies in this domain have meticulously detailed the process of model training and evaluation using distinct datasets, each characterized by unique strengths, weaknesses, and varying number of samples. In this paper, we scrutinize previous datasets from several perspectives while focusing on analyzing our dataset, which we claim is the most comprehensive in the realm of malware attribution. This dataset encompasses 64,440 malware samples attributed to 22 APT groups and spans a minimum of 40 malware families. The samples in the dataset span the years 2020 to 2024, and their developer APT groups originate from Russia, South Korea, China, USA, Nigeria, North Korea, Pakistan and Belarus. Its richness and breadth render it invaluable for future research endeavors in the field of malware attribution. |
doi_str_mv | 10.1109/ACCESS.2024.3473021 |
format | Article |
fullrecord | <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_10704627</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10704627</ieee_id><doaj_id>oai_doaj_org_article_728706fed1d6438db17352aa78efa3a2</doaj_id><sourcerecordid>3115573897</sourcerecordid><originalsourceid>FETCH-LOGICAL-c289t-76851f71f33ebdcf1c4d67a5cf82f0713886553f12986f793bfc3463dea56cc73</originalsourceid><addsrcrecordid>eNpNUdtqHDEMHUoKDUm-oH0w9Hk3tjW-TN-mkysktJD0rWC0vrSznay3tjclf19vJoToQRJC5xyk0zQfGV0yRrvTfhjO7-6WnPJ2Ca0Cytm75pAz2S1AgDx4039oTnJe0xq6joQ6bH723-8T2j8-fSE9GeLDNvnffpPHR09w40i_wempjBYncovTP0yenGHB7Av5WrMjcUP6UtK42pWx9iWSykguU9xt83HzPuCU_clLPWp-XJzfD1eLm2-X10N_s7Bcd2WhpBYsKBYA_MrZwGzrpEJhg-aBKgZaSyEgMN5pGVQHq2ChleA8CmmtgqPmeuZ1Eddmm8YHTE8m4mieBzH9MpjqEZM3imtFZfCOOdmCdiumQHBEpX1AQF65Ps9c2xT_7nwuZh13qX4hG2BMCAW62yvCvGVTzDn58KrKqNm7YmZXzN4V8-JKRX2aUaP3_g1C0VZyBf8BgdSHLQ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3115573897</pqid></control><display><type>article</type><title>APTracker: A Comprehensive and Analytical Malware Dataset Based on Attribution to APT Groups</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Erfan Mazaheri, Mohamad ; Shameli-Sendi, Alireza</creator><creatorcontrib>Erfan Mazaheri, Mohamad ; Shameli-Sendi, Alireza</creatorcontrib><description>Malware poses a significant threat to organizations, necessitating robust countermeasures. One such measure involves attributing malware to its respective Advanced Persistent Threat (APT) group, which serves several purposes, two of the most important ones are: aiding in incident response and facilitating legal recourse. Recent years have witnessed a surge in research efforts aimed at refining methods for attributing malware to specific threat groups. These endeavors have leveraged a variety of machine learning and deep learning techniques, alongside diverse features extracted from malware binary files, to develop attribution systems. Despite these advancements, the field continues to beckon further investigation to enhance attribution methodologies. The basis of developing an effective attribution systems is to benefit from a rich dataset. Previous studies in this domain have meticulously detailed the process of model training and evaluation using distinct datasets, each characterized by unique strengths, weaknesses, and varying number of samples. In this paper, we scrutinize previous datasets from several perspectives while focusing on analyzing our dataset, which we claim is the most comprehensive in the realm of malware attribution. This dataset encompasses 64,440 malware samples attributed to 22 APT groups and spans a minimum of 40 malware families. The samples in the dataset span the years 2020 to 2024, and their developer APT groups originate from Russia, South Korea, China, USA, Nigeria, North Korea, Pakistan and Belarus. Its richness and breadth render it invaluable for future research endeavors in the field of malware attribution.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3473021</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Accuracy ; Analytical models ; APT ; attribution ; Data models ; dataset ; Datasets ; Decision trees ; Deep learning ; Feature extraction ; Focusing ; Machine learning ; Malware ; malware attribution ; Random forests ; Source coding ; Training data ; Vectors</subject><ispartof>IEEE access, 2024, Vol.12, p.145148-145158</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c289t-76851f71f33ebdcf1c4d67a5cf82f0713886553f12986f793bfc3463dea56cc73</cites><orcidid>0000-0002-4723-5793 ; 0009-0000-8387-5619</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10704627$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Erfan Mazaheri, Mohamad</creatorcontrib><creatorcontrib>Shameli-Sendi, Alireza</creatorcontrib><title>APTracker: A Comprehensive and Analytical Malware Dataset Based on Attribution to APT Groups</title><title>IEEE access</title><addtitle>Access</addtitle><description>Malware poses a significant threat to organizations, necessitating robust countermeasures. One such measure involves attributing malware to its respective Advanced Persistent Threat (APT) group, which serves several purposes, two of the most important ones are: aiding in incident response and facilitating legal recourse. Recent years have witnessed a surge in research efforts aimed at refining methods for attributing malware to specific threat groups. These endeavors have leveraged a variety of machine learning and deep learning techniques, alongside diverse features extracted from malware binary files, to develop attribution systems. Despite these advancements, the field continues to beckon further investigation to enhance attribution methodologies. The basis of developing an effective attribution systems is to benefit from a rich dataset. Previous studies in this domain have meticulously detailed the process of model training and evaluation using distinct datasets, each characterized by unique strengths, weaknesses, and varying number of samples. In this paper, we scrutinize previous datasets from several perspectives while focusing on analyzing our dataset, which we claim is the most comprehensive in the realm of malware attribution. This dataset encompasses 64,440 malware samples attributed to 22 APT groups and spans a minimum of 40 malware families. The samples in the dataset span the years 2020 to 2024, and their developer APT groups originate from Russia, South Korea, China, USA, Nigeria, North Korea, Pakistan and Belarus. Its richness and breadth render it invaluable for future research endeavors in the field of malware attribution.</description><subject>Accuracy</subject><subject>Analytical models</subject><subject>APT</subject><subject>attribution</subject><subject>Data models</subject><subject>dataset</subject><subject>Datasets</subject><subject>Decision trees</subject><subject>Deep learning</subject><subject>Feature extraction</subject><subject>Focusing</subject><subject>Machine learning</subject><subject>Malware</subject><subject>malware attribution</subject><subject>Random forests</subject><subject>Source coding</subject><subject>Training data</subject><subject>Vectors</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUdtqHDEMHUoKDUm-oH0w9Hk3tjW-TN-mkysktJD0rWC0vrSznay3tjclf19vJoToQRJC5xyk0zQfGV0yRrvTfhjO7-6WnPJ2Ca0Cytm75pAz2S1AgDx4039oTnJe0xq6joQ6bH723-8T2j8-fSE9GeLDNvnffpPHR09w40i_wempjBYncovTP0yenGHB7Av5WrMjcUP6UtK42pWx9iWSykguU9xt83HzPuCU_clLPWp-XJzfD1eLm2-X10N_s7Bcd2WhpBYsKBYA_MrZwGzrpEJhg-aBKgZaSyEgMN5pGVQHq2ChleA8CmmtgqPmeuZ1Eddmm8YHTE8m4mieBzH9MpjqEZM3imtFZfCOOdmCdiumQHBEpX1AQF65Ps9c2xT_7nwuZh13qX4hG2BMCAW62yvCvGVTzDn58KrKqNm7YmZXzN4V8-JKRX2aUaP3_g1C0VZyBf8BgdSHLQ</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Erfan Mazaheri, Mohamad</creator><creator>Shameli-Sendi, Alireza</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-4723-5793</orcidid><orcidid>https://orcid.org/0009-0000-8387-5619</orcidid></search><sort><creationdate>2024</creationdate><title>APTracker: A Comprehensive and Analytical Malware Dataset Based on Attribution to APT Groups</title><author>Erfan Mazaheri, Mohamad ; Shameli-Sendi, Alireza</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c289t-76851f71f33ebdcf1c4d67a5cf82f0713886553f12986f793bfc3463dea56cc73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Analytical models</topic><topic>APT</topic><topic>attribution</topic><topic>Data models</topic><topic>dataset</topic><topic>Datasets</topic><topic>Decision trees</topic><topic>Deep learning</topic><topic>Feature extraction</topic><topic>Focusing</topic><topic>Machine learning</topic><topic>Malware</topic><topic>malware attribution</topic><topic>Random forests</topic><topic>Source coding</topic><topic>Training data</topic><topic>Vectors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Erfan Mazaheri, Mohamad</creatorcontrib><creatorcontrib>Shameli-Sendi, Alireza</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Erfan Mazaheri, Mohamad</au><au>Shameli-Sendi, Alireza</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>APTracker: A Comprehensive and Analytical Malware Dataset Based on Attribution to APT Groups</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024</date><risdate>2024</risdate><volume>12</volume><spage>145148</spage><epage>145158</epage><pages>145148-145158</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Malware poses a significant threat to organizations, necessitating robust countermeasures. One such measure involves attributing malware to its respective Advanced Persistent Threat (APT) group, which serves several purposes, two of the most important ones are: aiding in incident response and facilitating legal recourse. Recent years have witnessed a surge in research efforts aimed at refining methods for attributing malware to specific threat groups. These endeavors have leveraged a variety of machine learning and deep learning techniques, alongside diverse features extracted from malware binary files, to develop attribution systems. Despite these advancements, the field continues to beckon further investigation to enhance attribution methodologies. The basis of developing an effective attribution systems is to benefit from a rich dataset. Previous studies in this domain have meticulously detailed the process of model training and evaluation using distinct datasets, each characterized by unique strengths, weaknesses, and varying number of samples. In this paper, we scrutinize previous datasets from several perspectives while focusing on analyzing our dataset, which we claim is the most comprehensive in the realm of malware attribution. This dataset encompasses 64,440 malware samples attributed to 22 APT groups and spans a minimum of 40 malware families. The samples in the dataset span the years 2020 to 2024, and their developer APT groups originate from Russia, South Korea, China, USA, Nigeria, North Korea, Pakistan and Belarus. Its richness and breadth render it invaluable for future research endeavors in the field of malware attribution.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3473021</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0002-4723-5793</orcidid><orcidid>https://orcid.org/0009-0000-8387-5619</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2169-3536 |
ispartof | IEEE access, 2024, Vol.12, p.145148-145158 |
issn | 2169-3536 2169-3536 |
language | eng |
recordid | cdi_ieee_primary_10704627 |
source | IEEE Open Access Journals; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals |
subjects | Accuracy Analytical models APT attribution Data models dataset Datasets Decision trees Deep learning Feature extraction Focusing Machine learning Malware malware attribution Random forests Source coding Training data Vectors |
title | APTracker: A Comprehensive and Analytical Malware Dataset Based on Attribution to APT Groups |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T12%3A27%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=APTracker:%20A%20Comprehensive%20and%20Analytical%20Malware%20Dataset%20Based%20on%20Attribution%20to%20APT%20Groups&rft.jtitle=IEEE%20access&rft.au=Erfan%20Mazaheri,%20Mohamad&rft.date=2024&rft.volume=12&rft.spage=145148&rft.epage=145158&rft.pages=145148-145158&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3473021&rft_dat=%3Cproquest_ieee_%3E3115573897%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3115573897&rft_id=info:pmid/&rft_ieee_id=10704627&rft_doaj_id=oai_doaj_org_article_728706fed1d6438db17352aa78efa3a2&rfr_iscdi=true |