Highly compressed multi-pattern string matching on the cell broadband engine
With its 9 cores per chip, the IBM Cell/Broadband Engine (Cell) can deliver an impressive amount of compute power and benefit the string-matching kernels of network security, networkbusiness analytics and natural language processing applications. However, the available amount of main memory on the s...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | With its 9 cores per chip, the IBM Cell/Broadband Engine (Cell) can deliver an impressive amount of compute power and benefit the string-matching kernels of network security, networkbusiness analytics and natural language processing applications. However, the available amount of main memory on the system limits the maximum size of the dictionary supported by the string matching solution. To counter that, we propose a technique that employs compressed Aho-Corasick automata to perform fast, exact multi-pattern string matching with very large dictionaries. Our technique achieves the remarkable compression factors of 1:34 and 1:58, respectively, on the memory representation of English-language dictionaries and random binary string dictionaries. We demonstrate a parallel implementation for the Cell processor that delivers a sustained throughput between 0.90 and 2.35 Gbps per Cell blade, while supporting dictionary sizes up to 9.2 Million average patterns per Gbyte of main memory, and exhibiting resilience to content-based attacks. This high dictionary density enables natural language applications of an unprecedented scale to run on a single server blade. |
---|---|
ISSN: | 1530-1346 2642-7389 |
DOI: | 10.1109/ISCC.2011.5983850 |