Discovering Command and Control Channels Using Reinforcement Learning
Command and control (C2) paths for issuing commands to malware are sometimes the only indicators of its existence within networks. Identifying potential C2 channels is often a manually driven process that involves a deep understanding of cyber tradecraft. Efforts to improve discovery of these channe...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Wang, Cheng Kakkar, Akshay Redino, Christopher Rahman, Abdul S, Ajinsyam Clark, Ryan Radke, Daniel Cody, Tyler Huang, Lanxiao Bowen, Edward |
description | Command and control (C2) paths for issuing commands to malware are sometimes
the only indicators of its existence within networks. Identifying potential C2
channels is often a manually driven process that involves a deep understanding
of cyber tradecraft. Efforts to improve discovery of these channels through
using a reinforcement learning (RL) based approach that learns to automatically
carry out C2 attack campaigns on large networks, where multiple defense layers
are in place serves to drive efficiency for network operators. In this paper,
we model C2 traffic flow as a three-stage process and formulate it as a Markov
decision process (MDP) with the objective to maximize the number of valuable
hosts whose data is exfiltrated. The approach also specifically models payload
and defense mechanisms such as firewalls which is a novel contribution. The
attack paths learned by the RL agent can in turn help the blue team identify
high-priority vulnerabilities and develop improved defense strategies. The
method is evaluated on a large network with more than a thousand hosts and the
results demonstrate that the agent can effectively learn attack paths while
avoiding firewalls. |
doi_str_mv | 10.48550/arxiv.2401.07154 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2401_07154</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2401_07154</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2401_071543</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjEw1DMwNzQ14WRwdcksTs4vSy3KzEtXcM7PzU3MS1EAYef8vJKi_BwF54zEvLzUnGKF0GKQkqDUzLy0_KLk1NzUvBIFn9TEojygMA8Da1piTnEqL5TmZpB3cw1x9tAFWxhfUJSZm1hUGQ-yOB5ssTFhFQA31jik</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Discovering Command and Control Channels Using Reinforcement Learning</title><source>arXiv.org</source><creator>Wang, Cheng ; Kakkar, Akshay ; Redino, Christopher ; Rahman, Abdul ; S, Ajinsyam ; Clark, Ryan ; Radke, Daniel ; Cody, Tyler ; Huang, Lanxiao ; Bowen, Edward</creator><creatorcontrib>Wang, Cheng ; Kakkar, Akshay ; Redino, Christopher ; Rahman, Abdul ; S, Ajinsyam ; Clark, Ryan ; Radke, Daniel ; Cody, Tyler ; Huang, Lanxiao ; Bowen, Edward</creatorcontrib><description>Command and control (C2) paths for issuing commands to malware are sometimes
the only indicators of its existence within networks. Identifying potential C2
channels is often a manually driven process that involves a deep understanding
of cyber tradecraft. Efforts to improve discovery of these channels through
using a reinforcement learning (RL) based approach that learns to automatically
carry out C2 attack campaigns on large networks, where multiple defense layers
are in place serves to drive efficiency for network operators. In this paper,
we model C2 traffic flow as a three-stage process and formulate it as a Markov
decision process (MDP) with the objective to maximize the number of valuable
hosts whose data is exfiltrated. The approach also specifically models payload
and defense mechanisms such as firewalls which is a novel contribution. The
attack paths learned by the RL agent can in turn help the blue team identify
high-priority vulnerabilities and develop improved defense strategies. The
method is evaluated on a large network with more than a thousand hosts and the
results demonstrate that the agent can effectively learn attack paths while
avoiding firewalls.</description><identifier>DOI: 10.48550/arxiv.2401.07154</identifier><language>eng</language><subject>Computer Science - Cryptography and Security ; Computer Science - Learning</subject><creationdate>2024-01</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2401.07154$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.1109/SoutheastCon51012.2023.10115173$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.48550/arXiv.2401.07154$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wang, Cheng</creatorcontrib><creatorcontrib>Kakkar, Akshay</creatorcontrib><creatorcontrib>Redino, Christopher</creatorcontrib><creatorcontrib>Rahman, Abdul</creatorcontrib><creatorcontrib>S, Ajinsyam</creatorcontrib><creatorcontrib>Clark, Ryan</creatorcontrib><creatorcontrib>Radke, Daniel</creatorcontrib><creatorcontrib>Cody, Tyler</creatorcontrib><creatorcontrib>Huang, Lanxiao</creatorcontrib><creatorcontrib>Bowen, Edward</creatorcontrib><title>Discovering Command and Control Channels Using Reinforcement Learning</title><description>Command and control (C2) paths for issuing commands to malware are sometimes
the only indicators of its existence within networks. Identifying potential C2
channels is often a manually driven process that involves a deep understanding
of cyber tradecraft. Efforts to improve discovery of these channels through
using a reinforcement learning (RL) based approach that learns to automatically
carry out C2 attack campaigns on large networks, where multiple defense layers
are in place serves to drive efficiency for network operators. In this paper,
we model C2 traffic flow as a three-stage process and formulate it as a Markov
decision process (MDP) with the objective to maximize the number of valuable
hosts whose data is exfiltrated. The approach also specifically models payload
and defense mechanisms such as firewalls which is a novel contribution. The
attack paths learned by the RL agent can in turn help the blue team identify
high-priority vulnerabilities and develop improved defense strategies. The
method is evaluated on a large network with more than a thousand hosts and the
results demonstrate that the agent can effectively learn attack paths while
avoiding firewalls.</description><subject>Computer Science - Cryptography and Security</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjEw1DMwNzQ14WRwdcksTs4vSy3KzEtXcM7PzU3MS1EAYef8vJKi_BwF54zEvLzUnGKF0GKQkqDUzLy0_KLk1NzUvBIFn9TEojygMA8Da1piTnEqL5TmZpB3cw1x9tAFWxhfUJSZm1hUGQ-yOB5ssTFhFQA31jik</recordid><startdate>20240113</startdate><enddate>20240113</enddate><creator>Wang, Cheng</creator><creator>Kakkar, Akshay</creator><creator>Redino, Christopher</creator><creator>Rahman, Abdul</creator><creator>S, Ajinsyam</creator><creator>Clark, Ryan</creator><creator>Radke, Daniel</creator><creator>Cody, Tyler</creator><creator>Huang, Lanxiao</creator><creator>Bowen, Edward</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240113</creationdate><title>Discovering Command and Control Channels Using Reinforcement Learning</title><author>Wang, Cheng ; Kakkar, Akshay ; Redino, Christopher ; Rahman, Abdul ; S, Ajinsyam ; Clark, Ryan ; Radke, Daniel ; Cody, Tyler ; Huang, Lanxiao ; Bowen, Edward</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2401_071543</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Cryptography and Security</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Wang, Cheng</creatorcontrib><creatorcontrib>Kakkar, Akshay</creatorcontrib><creatorcontrib>Redino, Christopher</creatorcontrib><creatorcontrib>Rahman, Abdul</creatorcontrib><creatorcontrib>S, Ajinsyam</creatorcontrib><creatorcontrib>Clark, Ryan</creatorcontrib><creatorcontrib>Radke, Daniel</creatorcontrib><creatorcontrib>Cody, Tyler</creatorcontrib><creatorcontrib>Huang, Lanxiao</creatorcontrib><creatorcontrib>Bowen, Edward</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wang, Cheng</au><au>Kakkar, Akshay</au><au>Redino, Christopher</au><au>Rahman, Abdul</au><au>S, Ajinsyam</au><au>Clark, Ryan</au><au>Radke, Daniel</au><au>Cody, Tyler</au><au>Huang, Lanxiao</au><au>Bowen, Edward</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Discovering Command and Control Channels Using Reinforcement Learning</atitle><date>2024-01-13</date><risdate>2024</risdate><abstract>Command and control (C2) paths for issuing commands to malware are sometimes
the only indicators of its existence within networks. Identifying potential C2
channels is often a manually driven process that involves a deep understanding
of cyber tradecraft. Efforts to improve discovery of these channels through
using a reinforcement learning (RL) based approach that learns to automatically
carry out C2 attack campaigns on large networks, where multiple defense layers
are in place serves to drive efficiency for network operators. In this paper,
we model C2 traffic flow as a three-stage process and formulate it as a Markov
decision process (MDP) with the objective to maximize the number of valuable
hosts whose data is exfiltrated. The approach also specifically models payload
and defense mechanisms such as firewalls which is a novel contribution. The
attack paths learned by the RL agent can in turn help the blue team identify
high-priority vulnerabilities and develop improved defense strategies. The
method is evaluated on a large network with more than a thousand hosts and the
results demonstrate that the agent can effectively learn attack paths while
avoiding firewalls.</abstract><doi>10.48550/arxiv.2401.07154</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2401.07154 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2401_07154 |
source | arXiv.org |
subjects | Computer Science - Cryptography and Security Computer Science - Learning |
title | Discovering Command and Control Channels Using Reinforcement Learning |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T09%3A36%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Discovering%20Command%20and%20Control%20Channels%20Using%20Reinforcement%20Learning&rft.au=Wang,%20Cheng&rft.date=2024-01-13&rft_id=info:doi/10.48550/arxiv.2401.07154&rft_dat=%3Carxiv_GOX%3E2401_07154%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |