Modified Data Structure of Aho-Corasick

advertisement
Modified Data Structure of
Aho-Corasick
Project ECE-526 Spring 2006
Benfano Soewito, Ed Flanigan and John Pangrazio
Southern Illinois University Carbondale
Introduction
•
Aho-Corasick Algorithm is used to implement rule
checking for Snort type Intrusion Detection
Systems.
•
IDS Sensors are currently placed on hosts and
end nodes
•
Can prevent damage sooner if at core of network
Previous work
•
A pattern matching machine for the set of keywords {he, she, his,
hers}
It has 256 next state pointers which use large amounts memory
Aho-Corasick
Aho-Corasick:
•
•
Multi-pattern string matching
Time linear in the size of input
How it works:
•
•
•
•
Construct the state machine
The state machine starts in the empty root node
Each pattern is added to the state machine
Failure pointers are added from each node to the longest
prefix
Methodology
Goal in this project:
Modify the Aho-Corasick algorithm to use less
space in memory.
Methodology:
•
•
Use a single pointer instead 256 pointers
Use 256 bit bitmap
Methodology
Diagram Bitmap Data Structure
Expected result
•
Use of memory efficient algorithm will allow
implementation of Snort rules in a memory of
1.5Mb instead of 60Mb.
•
Allows the rules to be stored in SRAM on a
router/switch instead of independent host
•
Uses fewer memory lookups and faster search
method.
Results: Execution Time
CPU Time Vs Number of Strings (96KB)
0.4
0.35
0.334
0.35
CPU TIME
0.3
String Matches
0.25
1K
10k
0.2
0.15
0.1
0.05
0.01
0.034
0.035
0
10
500
Number of Strings
1000
# Str 1K 10K
10
0
0
500
25
47
1000 28
51
Results: Execution Time
CPU Time Vs Number of String (1.5KB)
3
2.775
2.62
CPU TIME
2.5
String Matches
2
# Str 1K 10K
1k (1.5KB)
10K (1.5KB)
1.5
1
0.5
0.395
0.063
0.271
0.288
0
10
500
Number of String
1000
10
0
0
500
69
85
1000 73
102
Results: Memory
Memory Use vs Number of Strings
Memory (Kb)
5000
4000
3815.244
3000
2000
1000
0
Aho
Bitmap
1557.128
61.532
6.3
10
410.76
191.3
500
100
Number of Strings
Results
Statistic of Rules/strings
Strings Nodes
Pointers
MEM
MEM
Non Bitmap Aho (KB) Bitmap (KB)
15383
61.532
6.30
10
175
500
5314
384032
1557.128
191.30
1000
11410
942561
3815.244
410.76
Total
5433.904
608.36
11.2 %
Discussion
•
•
Memory use linear with respect to
number of strings
Execution time impact dependent on
number of string matches
– Minimal Bitmap Computation Overhead
References
•
A. V. Aho and M. J. Corasick. Efficient string
matching: An aid to bibliographic search.
Communications of the ACM, 18(6):333–340,
1975.
•
By G. Varghese, T. Sherwood, N. Tuck and Brad
Calder. "Deterministic Memory-Efficient String
Matching Algorithms for Intrusion Detection“
•
R. S. Boyer and J. S. Moore. A fast string
searching algorithm
Download