SI-DFA: Sub-expression Integrated Deterministic Finite Automata for Deep Packet Inspection Authors: Ayesha Khalid, Rajat Sen†, Anupam Chattopadhyay Publisher: Performance Switching and Routing (HPSR), 2013 Present: Pei-Hua Huang Date: 2014/05/14 Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C. INTRODUCTION There is a space-time trade-off : NFAs are compact but slow, DFAs are fast but space hungry An ideal finite automata should thus have the processing speed of a DFA and space requirements of an NFA Computer & Internet Architecture Lab CSIE, National Cheng Kung University 2 STATE-EXPLOSION A phenomenon called exponential state blowup (or state explosion) happens when the regex corresponding to the NFA has following constructs • Counting Constraints • 1) .{n,m} : wildcard repetition between n~m times • 2) .{n,} : wildcard repetition at least n times • 3) .{n} : wildcard repetition exactly n times • Kleene Star (.*) Conditions • unbounded wildcard repetitions Computer & Internet Architecture Lab CSIE, National Cheng Kung University 3 SUB-EXPRESSION INTEGRATED DFA (SI-DFA) Break an expression into parts at blowup conditions and merge them into an integrated DFA • break regexes into parts called sub-expressions • using kleene star conditions as delimiters create a merged DFA for all the sub-expressions. The accepting states of DFA are labeled as Final Accepting States (FAS) or Sub-expression Accepting States (SAS) Computer & Internet Architecture Lab CSIE, National Cheng Kung University 4 SUB-EXPRESSION INTEGRATED DFA (SI-DFA) A regex is accepted if its constituent subexpressions are accepted in the right order A link bit is associated with every subexpression, whose addresses are specified in an Association Table Computer & Internet Architecture Lab CSIE, National Cheng Kung University 5 SUB-EXPRESSION INTEGRATED DFA (SI-DFA) Ex. ab.*cd and lm Consider a traffic trace cdablmcd Computer & Internet Architecture Lab CSIE, National Cheng Kung University 6 Cases not Conforming with SI-DFA Pseudo wildcard repetitions • • • a forbidden character table is constructed with occurrence of forbidden character x tied to invalidate the link bit corresponding to subexpression ab forbidden characters occur in subsequent subexpression cannot be handled by SI-DFA Ex. RE = ab[ˆx]*cxd input = abmcxd Computer & Internet Architecture Lab CSIE, National Cheng Kung University 7 Cases not Conforming with SI-DFA Subsequent sub-expressions overlap • • SI-DFA should start matching a sub-expression only after a subsequent sub-expression has already been accepted Ex. RE = ab.*bc input = abc Computer & Internet Architecture Lab CSIE, National Cheng Kung University 8 Cases not Conforming with SI-DFA Complete containment in subsequent subexpressions • • SI-DFA will generate erroneous result if a subexpression in a regex is completely contained in its following sub-expression Ex. RE = a.*b.d input = bad Computer & Internet Architecture Lab CSIE, National Cheng Kung University 9 Exact-match removal in .+ Cases ‘dot-plus’ condition, e.g., ab.+cd, will be the one that matches ab.*cd and not match abcd first making a Union automata of L1 and L2 and then converting the accepting state due to L2 as a non accepting state where L1={ab, cd} and L2={abcd}, L3 = L1−L2 Computer & Internet Architecture Lab CSIE, National Cheng Kung University 10 PERFORMANCE EVALUATION developed in C++ Testing platform is an AMD Phenom 1055T Processor with 8 GB of RAM and Linux operating system rule-sets extracted from Bro 2.0 [19], Snort [20], and linux [21] rules Computer & Internet Architecture Lab CSIE, National Cheng Kung University 11 PERFORMANCE EVALUATION Computer & Internet Architecture Lab CSIE, National Cheng Kung University 12 PERFORMANCE EVALUATION Computer & Internet Architecture Lab CSIE, National Cheng Kung University 13