The quality improvement method for detecting attacks on web applications using pre-trained natural language models

This paper explores the use of deep learning techniques to improve the performance of web application firewalls (WAFs), describes a specific method for improving the performance of web application firewalls, and presents the results of its testing on publicly available CSIC 2010 data. Most web appli...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Izvestiâ Saratovskogo universiteta. Novaâ seriâ. Seriâ Matematika. Mehanika. Informatika (Online) 2024-08, Vol.24 (3), p.442-451
Hauptverfasser: Kovaleva, O. A., Samokhvalov, A. V., Liashkov, M. A., Pchelintsev, S. Yu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper explores the use of deep learning techniques to improve the performance of web application firewalls (WAFs), describes a specific method for improving the performance of web application firewalls, and presents the results of its testing on publicly available CSIC 2010 data. Most web application firewalls work on the basis of rules that have been compiled by experts. When running, firewalls inspect HTTP requests exchanged between client and server to detect attacks and block potential threats. Manual drafting of rules requires experts' time, and distributed ready-made rule sets do not take into account the specifics of particular user applications, therefore they allow many false positives and miss many network attacks. In recent years, the use of pretrained language models has led to significant improvements in a diverse set of natural language processing tasks as they are able to perform knowledge transfer. The article describes the adaptation of these approaches to the field of information security, i.e. the use of a pretrained language model as a feature extractor to match an HTTP request with a feature vector. These vectors are then used to train the classifier. We offer a solution that consists of two stages. In the first step, we create a deep pre-trained language model based on normal HTTP requests to the web application. In the second step, we use this model as a feature extractor and train a one-class classifier. Both steps are performed for each application. The experimental results show that the proposed approach significantly outperforms the classical Mod-Security approaches based on rules configured using OWASP CRS and does not require the involvement of a security expert to define trigger rules.
ISSN:1816-9791
2541-9005
DOI:10.18500/1816-9791-2024-24-3-442-451