Discovering Command and Control Channels Using Reinforcement Learning

Command and control (C2) paths for issuing commands to malware are sometimes the only indicators of its existence within networks. Identifying potential C2 channels is often a manually driven process that involves a deep understanding of cyber tradecraft. Efforts to improve discovery of these channe...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Wang, Cheng, Kakkar, Akshay, Redino, Christopher, Rahman, Abdul, S, Ajinsyam, Clark, Ryan, Radke, Daniel, Cody, Tyler, Huang, Lanxiao, Bowen, Edward
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Cryptography and Security Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Wang, Cheng Kakkar, Akshay Redino, Christopher Rahman, Abdul S, Ajinsyam Clark, Ryan Radke, Daniel Cody, Tyler Huang, Lanxiao Bowen, Edward
description	Command and control (C2) paths for issuing commands to malware are sometimes the only indicators of its existence within networks. Identifying potential C2 channels is often a manually driven process that involves a deep understanding of cyber tradecraft. Efforts to improve discovery of these channels through using a reinforcement learning (RL) based approach that learns to automatically carry out C2 attack campaigns on large networks, where multiple defense layers are in place serves to drive efficiency for network operators. In this paper, we model C2 traffic flow as a three-stage process and formulate it as a Markov decision process (MDP) with the objective to maximize the number of valuable hosts whose data is exfiltrated. The approach also specifically models payload and defense mechanisms such as firewalls which is a novel contribution. The attack paths learned by the RL agent can in turn help the blue team identify high-priority vulnerabilities and develop improved defense strategies. The method is evaluated on a large network with more than a thousand hosts and the results demonstrate that the agent can effectively learn attack paths while avoiding firewalls.
doi_str_mv	10.48550/arxiv.2401.07154
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2401_07154</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2401_07154</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2401_071543</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjEw1DMwNzQ14WRwdcksTs4vSy3KzEtXcM7PzU3MS1EAYef8vJKi_BwF54zEvLzUnGKF0GKQkqDUzLy0_KLk1NzUvBIFn9TEojygMA8Da1piTnEqL5TmZpB3cw1x9tAFWxhfUJSZm1hUGQ-yOB5ssTFhFQA31jik</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Discovering Command and Control Channels Using Reinforcement Learning</title><source>arXiv.org</source><creator>Wang, Cheng ; Kakkar, Akshay ; Redino, Christopher ; Rahman, Abdul ; S, Ajinsyam ; Clark, Ryan ; Radke, Daniel ; Cody, Tyler ; Huang, Lanxiao ; Bowen, Edward</creator><creatorcontrib>Wang, Cheng ; Kakkar, Akshay ; Redino, Christopher ; Rahman, Abdul ; S, Ajinsyam ; Clark, Ryan ; Radke, Daniel ; Cody, Tyler ; Huang, Lanxiao ; Bowen, Edward</creatorcontrib><description>Command and control (C2) paths for issuing commands to malware are sometimes the only indicators of its existence within networks. Identifying potential C2 channels is often a manually driven process that involves a deep understanding of cyber tradecraft. Efforts to improve discovery of these channels through using a reinforcement learning (RL) based approach that learns to automatically carry out C2 attack campaigns on large networks, where multiple defense layers are in place serves to drive efficiency for network operators. In this paper, we model C2 traffic flow as a three-stage process and formulate it as a Markov decision process (MDP) with the objective to maximize the number of valuable hosts whose data is exfiltrated. The approach also specifically models payload and defense mechanisms such as firewalls which is a novel contribution. The attack paths learned by the RL agent can in turn help the blue team identify high-priority vulnerabilities and develop improved defense strategies. The method is evaluated on a large network with more than a thousand hosts and the results demonstrate that the agent can effectively learn attack paths while avoiding firewalls.</description><identifier>DOI: 10.48550/arxiv.2401.07154</identifier><language>eng</language><subject>Computer Science - Cryptography and Security ; Computer Science - Learning</subject><creationdate>2024-01</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2401.07154$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.1109/SoutheastCon51012.2023.10115173$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.48550/arXiv.2401.07154$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wang, Cheng</creatorcontrib><creatorcontrib>Kakkar, Akshay</creatorcontrib><creatorcontrib>Redino, Christopher</creatorcontrib><creatorcontrib>Rahman, Abdul</creatorcontrib><creatorcontrib>S, Ajinsyam</creatorcontrib><creatorcontrib>Clark, Ryan</creatorcontrib><creatorcontrib>Radke, Daniel</creatorcontrib><creatorcontrib>Cody, Tyler</creatorcontrib><creatorcontrib>Huang, Lanxiao</creatorcontrib><creatorcontrib>Bowen, Edward</creatorcontrib><title>Discovering Command and Control Channels Using Reinforcement Learning</title><description>Command and control (C2) paths for issuing commands to malware are sometimes the only indicators of its existence within networks. Identifying potential C2 channels is often a manually driven process that involves a deep understanding of cyber tradecraft. Efforts to improve discovery of these channels through using a reinforcement learning (RL) based approach that learns to automatically carry out C2 attack campaigns on large networks, where multiple defense layers are in place serves to drive efficiency for network operators. In this paper, we model C2 traffic flow as a three-stage process and formulate it as a Markov decision process (MDP) with the objective to maximize the number of valuable hosts whose data is exfiltrated. The approach also specifically models payload and defense mechanisms such as firewalls which is a novel contribution. The attack paths learned by the RL agent can in turn help the blue team identify high-priority vulnerabilities and develop improved defense strategies. The method is evaluated on a large network with more than a thousand hosts and the results demonstrate that the agent can effectively learn attack paths while avoiding firewalls.</description><subject>Computer Science - Cryptography and Security</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjEw1DMwNzQ14WRwdcksTs4vSy3KzEtXcM7PzU3MS1EAYef8vJKi_BwF54zEvLzUnGKF0GKQkqDUzLy0_KLk1NzUvBIFn9TEojygMA8Da1piTnEqL5TmZpB3cw1x9tAFWxhfUJSZm1hUGQ-yOB5ssTFhFQA31jik</recordid><startdate>20240113</startdate><enddate>20240113</enddate><creator>Wang, Cheng</creator><creator>Kakkar, Akshay</creator><creator>Redino, Christopher</creator><creator>Rahman, Abdul</creator><creator>S, Ajinsyam</creator><creator>Clark, Ryan</creator><creator>Radke, Daniel</creator><creator>Cody, Tyler</creator><creator>Huang, Lanxiao</creator><creator>Bowen, Edward</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240113</creationdate><title>Discovering Command and Control Channels Using Reinforcement Learning</title><author>Wang, Cheng ; Kakkar, Akshay ; Redino, Christopher ; Rahman, Abdul ; S, Ajinsyam ; Clark, Ryan ; Radke, Daniel ; Cody, Tyler ; Huang, Lanxiao ; Bowen, Edward</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2401_071543</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Cryptography and Security</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Wang, Cheng</creatorcontrib><creatorcontrib>Kakkar, Akshay</creatorcontrib><creatorcontrib>Redino, Christopher</creatorcontrib><creatorcontrib>Rahman, Abdul</creatorcontrib><creatorcontrib>S, Ajinsyam</creatorcontrib><creatorcontrib>Clark, Ryan</creatorcontrib><creatorcontrib>Radke, Daniel</creatorcontrib><creatorcontrib>Cody, Tyler</creatorcontrib><creatorcontrib>Huang, Lanxiao</creatorcontrib><creatorcontrib>Bowen, Edward</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wang, Cheng</au><au>Kakkar, Akshay</au><au>Redino, Christopher</au><au>Rahman, Abdul</au><au>S, Ajinsyam</au><au>Clark, Ryan</au><au>Radke, Daniel</au><au>Cody, Tyler</au><au>Huang, Lanxiao</au><au>Bowen, Edward</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Discovering Command and Control Channels Using Reinforcement Learning</atitle><date>2024-01-13</date><risdate>2024</risdate><abstract>Command and control (C2) paths for issuing commands to malware are sometimes the only indicators of its existence within networks. Identifying potential C2 channels is often a manually driven process that involves a deep understanding of cyber tradecraft. Efforts to improve discovery of these channels through using a reinforcement learning (RL) based approach that learns to automatically carry out C2 attack campaigns on large networks, where multiple defense layers are in place serves to drive efficiency for network operators. In this paper, we model C2 traffic flow as a three-stage process and formulate it as a Markov decision process (MDP) with the objective to maximize the number of valuable hosts whose data is exfiltrated. The approach also specifically models payload and defense mechanisms such as firewalls which is a novel contribution. The attack paths learned by the RL agent can in turn help the blue team identify high-priority vulnerabilities and develop improved defense strategies. The method is evaluated on a large network with more than a thousand hosts and the results demonstrate that the agent can effectively learn attack paths while avoiding firewalls.</abstract><doi>10.48550/arxiv.2401.07154</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2401.07154
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2401_07154
source	arXiv.org
subjects	Computer Science - Cryptography and Security Computer Science - Learning
title	Discovering Command and Control Channels Using Reinforcement Learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T09%3A36%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Discovering%20Command%20and%20Control%20Channels%20Using%20Reinforcement%20Learning&rft.au=Wang,%20Cheng&rft.date=2024-01-13&rft_id=info:doi/10.48550/arxiv.2401.07154&rft_dat=%3Carxiv_GOX%3E2401_07154%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true