APTracker: A Comprehensive and Analytical Malware Dataset Based on Attribution to APT Groups

Malware poses a significant threat to organizations, necessitating robust countermeasures. One such measure involves attributing malware to its respective Advanced Persistent Threat (APT) group, which serves several purposes, two of the most important ones are: aiding in incident response and facili...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2024, Vol.12, p.145148-145158
Hauptverfasser: Erfan Mazaheri, Mohamad, Shameli-Sendi, Alireza
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 145158
container_issue
container_start_page 145148
container_title IEEE access
container_volume 12
creator Erfan Mazaheri, Mohamad
Shameli-Sendi, Alireza
description Malware poses a significant threat to organizations, necessitating robust countermeasures. One such measure involves attributing malware to its respective Advanced Persistent Threat (APT) group, which serves several purposes, two of the most important ones are: aiding in incident response and facilitating legal recourse. Recent years have witnessed a surge in research efforts aimed at refining methods for attributing malware to specific threat groups. These endeavors have leveraged a variety of machine learning and deep learning techniques, alongside diverse features extracted from malware binary files, to develop attribution systems. Despite these advancements, the field continues to beckon further investigation to enhance attribution methodologies. The basis of developing an effective attribution systems is to benefit from a rich dataset. Previous studies in this domain have meticulously detailed the process of model training and evaluation using distinct datasets, each characterized by unique strengths, weaknesses, and varying number of samples. In this paper, we scrutinize previous datasets from several perspectives while focusing on analyzing our dataset, which we claim is the most comprehensive in the realm of malware attribution. This dataset encompasses 64,440 malware samples attributed to 22 APT groups and spans a minimum of 40 malware families. The samples in the dataset span the years 2020 to 2024, and their developer APT groups originate from Russia, South Korea, China, USA, Nigeria, North Korea, Pakistan and Belarus. Its richness and breadth render it invaluable for future research endeavors in the field of malware attribution.
doi_str_mv 10.1109/ACCESS.2024.3473021
format Article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_10704627</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10704627</ieee_id><doaj_id>oai_doaj_org_article_728706fed1d6438db17352aa78efa3a2</doaj_id><sourcerecordid>3115573897</sourcerecordid><originalsourceid>FETCH-LOGICAL-c289t-76851f71f33ebdcf1c4d67a5cf82f0713886553f12986f793bfc3463dea56cc73</originalsourceid><addsrcrecordid>eNpNUdtqHDEMHUoKDUm-oH0w9Hk3tjW-TN-mkysktJD0rWC0vrSznay3tjclf19vJoToQRJC5xyk0zQfGV0yRrvTfhjO7-6WnPJ2Ca0Cytm75pAz2S1AgDx4039oTnJe0xq6joQ6bH723-8T2j8-fSE9GeLDNvnffpPHR09w40i_wempjBYncovTP0yenGHB7Av5WrMjcUP6UtK42pWx9iWSykguU9xt83HzPuCU_clLPWp-XJzfD1eLm2-X10N_s7Bcd2WhpBYsKBYA_MrZwGzrpEJhg-aBKgZaSyEgMN5pGVQHq2ChleA8CmmtgqPmeuZ1Eddmm8YHTE8m4mieBzH9MpjqEZM3imtFZfCOOdmCdiumQHBEpX1AQF65Ps9c2xT_7nwuZh13qX4hG2BMCAW62yvCvGVTzDn58KrKqNm7YmZXzN4V8-JKRX2aUaP3_g1C0VZyBf8BgdSHLQ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3115573897</pqid></control><display><type>article</type><title>APTracker: A Comprehensive and Analytical Malware Dataset Based on Attribution to APT Groups</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Erfan Mazaheri, Mohamad ; Shameli-Sendi, Alireza</creator><creatorcontrib>Erfan Mazaheri, Mohamad ; Shameli-Sendi, Alireza</creatorcontrib><description>Malware poses a significant threat to organizations, necessitating robust countermeasures. One such measure involves attributing malware to its respective Advanced Persistent Threat (APT) group, which serves several purposes, two of the most important ones are: aiding in incident response and facilitating legal recourse. Recent years have witnessed a surge in research efforts aimed at refining methods for attributing malware to specific threat groups. These endeavors have leveraged a variety of machine learning and deep learning techniques, alongside diverse features extracted from malware binary files, to develop attribution systems. Despite these advancements, the field continues to beckon further investigation to enhance attribution methodologies. The basis of developing an effective attribution systems is to benefit from a rich dataset. Previous studies in this domain have meticulously detailed the process of model training and evaluation using distinct datasets, each characterized by unique strengths, weaknesses, and varying number of samples. In this paper, we scrutinize previous datasets from several perspectives while focusing on analyzing our dataset, which we claim is the most comprehensive in the realm of malware attribution. This dataset encompasses 64,440 malware samples attributed to 22 APT groups and spans a minimum of 40 malware families. The samples in the dataset span the years 2020 to 2024, and their developer APT groups originate from Russia, South Korea, China, USA, Nigeria, North Korea, Pakistan and Belarus. Its richness and breadth render it invaluable for future research endeavors in the field of malware attribution.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3473021</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Accuracy ; Analytical models ; APT ; attribution ; Data models ; dataset ; Datasets ; Decision trees ; Deep learning ; Feature extraction ; Focusing ; Machine learning ; Malware ; malware attribution ; Random forests ; Source coding ; Training data ; Vectors</subject><ispartof>IEEE access, 2024, Vol.12, p.145148-145158</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c289t-76851f71f33ebdcf1c4d67a5cf82f0713886553f12986f793bfc3463dea56cc73</cites><orcidid>0000-0002-4723-5793 ; 0009-0000-8387-5619</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10704627$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Erfan Mazaheri, Mohamad</creatorcontrib><creatorcontrib>Shameli-Sendi, Alireza</creatorcontrib><title>APTracker: A Comprehensive and Analytical Malware Dataset Based on Attribution to APT Groups</title><title>IEEE access</title><addtitle>Access</addtitle><description>Malware poses a significant threat to organizations, necessitating robust countermeasures. One such measure involves attributing malware to its respective Advanced Persistent Threat (APT) group, which serves several purposes, two of the most important ones are: aiding in incident response and facilitating legal recourse. Recent years have witnessed a surge in research efforts aimed at refining methods for attributing malware to specific threat groups. These endeavors have leveraged a variety of machine learning and deep learning techniques, alongside diverse features extracted from malware binary files, to develop attribution systems. Despite these advancements, the field continues to beckon further investigation to enhance attribution methodologies. The basis of developing an effective attribution systems is to benefit from a rich dataset. Previous studies in this domain have meticulously detailed the process of model training and evaluation using distinct datasets, each characterized by unique strengths, weaknesses, and varying number of samples. In this paper, we scrutinize previous datasets from several perspectives while focusing on analyzing our dataset, which we claim is the most comprehensive in the realm of malware attribution. This dataset encompasses 64,440 malware samples attributed to 22 APT groups and spans a minimum of 40 malware families. The samples in the dataset span the years 2020 to 2024, and their developer APT groups originate from Russia, South Korea, China, USA, Nigeria, North Korea, Pakistan and Belarus. Its richness and breadth render it invaluable for future research endeavors in the field of malware attribution.</description><subject>Accuracy</subject><subject>Analytical models</subject><subject>APT</subject><subject>attribution</subject><subject>Data models</subject><subject>dataset</subject><subject>Datasets</subject><subject>Decision trees</subject><subject>Deep learning</subject><subject>Feature extraction</subject><subject>Focusing</subject><subject>Machine learning</subject><subject>Malware</subject><subject>malware attribution</subject><subject>Random forests</subject><subject>Source coding</subject><subject>Training data</subject><subject>Vectors</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUdtqHDEMHUoKDUm-oH0w9Hk3tjW-TN-mkysktJD0rWC0vrSznay3tjclf19vJoToQRJC5xyk0zQfGV0yRrvTfhjO7-6WnPJ2Ca0Cytm75pAz2S1AgDx4039oTnJe0xq6joQ6bH723-8T2j8-fSE9GeLDNvnffpPHR09w40i_wempjBYncovTP0yenGHB7Av5WrMjcUP6UtK42pWx9iWSykguU9xt83HzPuCU_clLPWp-XJzfD1eLm2-X10N_s7Bcd2WhpBYsKBYA_MrZwGzrpEJhg-aBKgZaSyEgMN5pGVQHq2ChleA8CmmtgqPmeuZ1Eddmm8YHTE8m4mieBzH9MpjqEZM3imtFZfCOOdmCdiumQHBEpX1AQF65Ps9c2xT_7nwuZh13qX4hG2BMCAW62yvCvGVTzDn58KrKqNm7YmZXzN4V8-JKRX2aUaP3_g1C0VZyBf8BgdSHLQ</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Erfan Mazaheri, Mohamad</creator><creator>Shameli-Sendi, Alireza</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-4723-5793</orcidid><orcidid>https://orcid.org/0009-0000-8387-5619</orcidid></search><sort><creationdate>2024</creationdate><title>APTracker: A Comprehensive and Analytical Malware Dataset Based on Attribution to APT Groups</title><author>Erfan Mazaheri, Mohamad ; Shameli-Sendi, Alireza</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c289t-76851f71f33ebdcf1c4d67a5cf82f0713886553f12986f793bfc3463dea56cc73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Analytical models</topic><topic>APT</topic><topic>attribution</topic><topic>Data models</topic><topic>dataset</topic><topic>Datasets</topic><topic>Decision trees</topic><topic>Deep learning</topic><topic>Feature extraction</topic><topic>Focusing</topic><topic>Machine learning</topic><topic>Malware</topic><topic>malware attribution</topic><topic>Random forests</topic><topic>Source coding</topic><topic>Training data</topic><topic>Vectors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Erfan Mazaheri, Mohamad</creatorcontrib><creatorcontrib>Shameli-Sendi, Alireza</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Erfan Mazaheri, Mohamad</au><au>Shameli-Sendi, Alireza</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>APTracker: A Comprehensive and Analytical Malware Dataset Based on Attribution to APT Groups</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024</date><risdate>2024</risdate><volume>12</volume><spage>145148</spage><epage>145158</epage><pages>145148-145158</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Malware poses a significant threat to organizations, necessitating robust countermeasures. One such measure involves attributing malware to its respective Advanced Persistent Threat (APT) group, which serves several purposes, two of the most important ones are: aiding in incident response and facilitating legal recourse. Recent years have witnessed a surge in research efforts aimed at refining methods for attributing malware to specific threat groups. These endeavors have leveraged a variety of machine learning and deep learning techniques, alongside diverse features extracted from malware binary files, to develop attribution systems. Despite these advancements, the field continues to beckon further investigation to enhance attribution methodologies. The basis of developing an effective attribution systems is to benefit from a rich dataset. Previous studies in this domain have meticulously detailed the process of model training and evaluation using distinct datasets, each characterized by unique strengths, weaknesses, and varying number of samples. In this paper, we scrutinize previous datasets from several perspectives while focusing on analyzing our dataset, which we claim is the most comprehensive in the realm of malware attribution. This dataset encompasses 64,440 malware samples attributed to 22 APT groups and spans a minimum of 40 malware families. The samples in the dataset span the years 2020 to 2024, and their developer APT groups originate from Russia, South Korea, China, USA, Nigeria, North Korea, Pakistan and Belarus. Its richness and breadth render it invaluable for future research endeavors in the field of malware attribution.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3473021</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0002-4723-5793</orcidid><orcidid>https://orcid.org/0009-0000-8387-5619</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2024, Vol.12, p.145148-145158
issn 2169-3536
2169-3536
language eng
recordid cdi_ieee_primary_10704627
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals
subjects Accuracy
Analytical models
APT
attribution
Data models
dataset
Datasets
Decision trees
Deep learning
Feature extraction
Focusing
Machine learning
Malware
malware attribution
Random forests
Source coding
Training data
Vectors
title APTracker: A Comprehensive and Analytical Malware Dataset Based on Attribution to APT Groups
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T12%3A27%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=APTracker:%20A%20Comprehensive%20and%20Analytical%20Malware%20Dataset%20Based%20on%20Attribution%20to%20APT%20Groups&rft.jtitle=IEEE%20access&rft.au=Erfan%20Mazaheri,%20Mohamad&rft.date=2024&rft.volume=12&rft.spage=145148&rft.epage=145158&rft.pages=145148-145158&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3473021&rft_dat=%3Cproquest_ieee_%3E3115573897%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3115573897&rft_id=info:pmid/&rft_ieee_id=10704627&rft_doaj_id=oai_doaj_org_article_728706fed1d6438db17352aa78efa3a2&rfr_iscdi=true