HARD: Bit-Split String Matching Using a Heuristic Algorithm to Reduce Memory Demand

High-speed content inspection relies on a fast multi-pattern matching algorithm to detect predefined rules. When the number of target rules becomes large, the memory requirements of the matching engine become a critical issue. An effective technique to design high-performance matching engines is to...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Romanian journal of information science and technology 2020-01, Vol.23 (T), p.T94-T105
Hauptverfasser: Li, Xun, Chen, Lishui, Tang, Yazhe
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:High-speed content inspection relies on a fast multi-pattern matching algorithm to detect predefined rules. When the number of target rules becomes large, the memory requirements of the matching engine become a critical issue. An effective technique to design high-performance matching engines is to divide the target rule set into multiple subgroups and to use a parallel matching hardware unit for each subgroup. The key to this effective technique is how to find a strategy to divide subgroups. This paper proposes an effective rule classifying method referred to as HARD for heterogeneous bit-split string matching architectures. HARD uses the uniqueness of the target pattern to classify all target rule characters. This paper also presents a method to estimate the distance between strings in unique pattern category. The distance formula is next used to find a class for each rule. Furthermore, each class will be processed on different sizes of finite state machine. The experimental results show that the more the number of rules in the rule set, the more obvious the effect of HARD. In popular data sets, when the number of rules is above 4000, HARD can save nearly 50% of memory consumption compared to the previous bit-split string matching methods mentioned in the paper.
ISSN:1453-8245