Modified Data Structure of Aho-Corasick Project ECE-526 Spring 2006 Benfano Soewito, Ed Flanigan and John Pangrazio Southern Illinois University Carbondale Introduction • Aho-Corasick Algorithm is used to implement rule checking for Snort type Intrusion Detection Systems. • IDS Sensors are currently placed on hosts and end nodes • Can prevent damage sooner if at core of network Previous work • A pattern matching machine for the set of keywords {he, she, his, hers} It has 256 next state pointers which use large amounts memory Aho-Corasick Aho-Corasick: • • Multi-pattern string matching Time linear in the size of input How it works: • • • • Construct the state machine The state machine starts in the empty root node Each pattern is added to the state machine Failure pointers are added from each node to the longest prefix Methodology Goal in this project: Modify the Aho-Corasick algorithm to use less space in memory. Methodology: • • Use a single pointer instead 256 pointers Use 256 bit bitmap Methodology Diagram Bitmap Data Structure Expected result • Use of memory efficient algorithm will allow implementation of Snort rules in a memory of 1.5Mb instead of 60Mb. • Allows the rules to be stored in SRAM on a router/switch instead of independent host • Uses fewer memory lookups and faster search method. Results: Execution Time CPU Time Vs Number of Strings (96KB) 0.4 0.35 0.334 0.35 CPU TIME 0.3 String Matches 0.25 1K 10k 0.2 0.15 0.1 0.05 0.01 0.034 0.035 0 10 500 Number of Strings 1000 # Str 1K 10K 10 0 0 500 25 47 1000 28 51 Results: Execution Time CPU Time Vs Number of String (1.5KB) 3 2.775 2.62 CPU TIME 2.5 String Matches 2 # Str 1K 10K 1k (1.5KB) 10K (1.5KB) 1.5 1 0.5 0.395 0.063 0.271 0.288 0 10 500 Number of String 1000 10 0 0 500 69 85 1000 73 102 Results: Memory Memory Use vs Number of Strings Memory (Kb) 5000 4000 3815.244 3000 2000 1000 0 Aho Bitmap 1557.128 61.532 6.3 10 410.76 191.3 500 100 Number of Strings Results Statistic of Rules/strings Strings Nodes Pointers MEM MEM Non Bitmap Aho (KB) Bitmap (KB) 15383 61.532 6.30 10 175 500 5314 384032 1557.128 191.30 1000 11410 942561 3815.244 410.76 Total 5433.904 608.36 11.2 % Discussion • • Memory use linear with respect to number of strings Execution time impact dependent on number of string matches – Minimal Bitmap Computation Overhead References • A. V. Aho and M. J. Corasick. Efficient string matching: An aid to bibliographic search. Communications of the ACM, 18(6):333–340, 1975. • By G. Varghese, T. Sherwood, N. Tuck and Brad Calder. "Deterministic Memory-Efficient String Matching Algorithms for Intrusion Detection“ • R. S. Boyer and J. S. Moore. A fast string searching algorithm