Improved Deep Recurrent Q-Network of POMDPs for Automated Penetration Testing

With the development of technology, people’s daily lives are closely related to networks. The importance of cybersecurity protection draws global attention. Automated penetration testing is the novel method to protect the security of networks, which enhances efficiency and reduces costs compared wit...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied sciences 2022-10, Vol.12 (20), p.10339
Hauptverfasser: Zhang, Yue, Liu, Jingju, Zhou, Shicheng, Hou, Dongdong, Zhong, Xiaofeng, Lu, Canju
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:With the development of technology, people’s daily lives are closely related to networks. The importance of cybersecurity protection draws global attention. Automated penetration testing is the novel method to protect the security of networks, which enhances efficiency and reduces costs compared with traditional manual penetration testing. Previous studies have provided many ways to obtain a better policy for penetration testing paths, but many studies are based on ideal penetration testing scenarios. In order to find potential vulnerabilities from the perspective of hackers in the real world, this paper models the process of black-box penetration testing as a Partially Observed Markov Decision Process (POMDP). In addition, we propose a new algorithm named ND3RQN, which is applied to the automated black-box penetration testing. In the POMDP model, an agent interacts with a network environment to choose a better policy without insider information about the target network, except for the start points. To handle this problem, we utilize a Long Short-Term Memory (LSTM) structure empowering agent to make decisions based on historical memory. In addition, this paper enhances the current algorithm using the structure of the neural network, the calculation method of the Q-value, and adding noise parameters to the neural network to advance the generalization and efficiency of this algorithm. In the last section, we conduct comparison experiments of the ND3RQN algorithm and other recent state-of-the-art (SOTA) algorithms. The experimental results vividly show that this novel algorithm is able to find a greater attack-path strategy for all vulnerable hosts in the automated black-box penetration testing. Additionally, the generalization and robustness of this algorithm are far superior to other SOTA algorithms in different size simulation scenarios based on the CyberBattleSim simulation developed by Microsoft.
ISSN:2076-3417
2076-3417
DOI:10.3390/app122010339