Application-Oblivious L7 Parsing Using Recurrent Neural Networks

Extracting fields from layer 7 protocols such as HTTP, known as L7 parsing, is the key to many critical network applications. However, existing L7 parsing techniques center around protocol specifications, thereby incurring large human efforts in specifying data format and high computational/memory c...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE/ACM transactions on networking 2020-10, Vol.28 (5), p.2009-2022
Hauptverfasser:	Li, Hao, Bian, Zhengda, Zhang, Peng, Sun, Zhun, Hu, Chengchen, Fu, Qiang, Pan, Tian, Lv, Jia
Format:	Artikel
Sprache:	eng
Schlagworte:	Application layer protocol Data mining Data models deep packet inspection Format Labeling Labelling Neural networks Payloads Protocol (computers) protocol parsing Protocols Recurrent neural networks recurrent neural networks (RNNs) Specifications Task analysis
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Extracting fields from layer 7 protocols such as HTTP, known as L7 parsing, is the key to many critical network applications. However, existing L7 parsing techniques center around protocol specifications, thereby incurring large human efforts in specifying data format and high computational/memory costs that poorly scale with the explosive number of L7 protocols. To this end, this paper introduces a new framework named content-based L7 parsing , where the content instead of the format becomes the first class citizen. Under this framework, users only need to label what content they are interested in, and the parser learns an extraction model from the users' labeling behaviors. Since the parser is specification-independent, both the human effort and computational/memory costs can be dramatically reduced. To realize content-based L7 parsing, we propose REPLAY which builds on recurrent neural network (RNN) and addresses a series of technical challenges like large labeling overhead and slow parsing speed. We prototype REPLAY on GPUs, and show it can achieve a precision of 98% and a recall of 97%, with a throughput as high as 12Gbps for diverse extraction tasks.
ISSN:	1063-6692 1558-2566
DOI:	10.1109/TNET.2020.3000430