GPEP : Graphics Processing Enhanced PatternMatching for High-Performance Deep Packet Inspection Author: Lucas John Vespa, Ning Weng Publisher: 2011 IEEE International Conferences on Internet of Things ,and Cyber , Physical and Social Computing (4th CPSCom) Presenter: Ye-Zhi Chen Date: 2012/04/25 Introduction GPEP uses an optimized version of our pattern matching algorithm called P3FSM, which has low operational complexity, but reduces the memory requirement such that the state tables can fit into the small on chip memories of a GPU P3FSM 1. DFA Optimization (split-DFA): This optimization splits the DFA transitions into primary and secondary blocks at the first level of the DFA. All incoming transitions to the primary block are removed from the DFA. An example split-DFA is shown in Figure 1(b). The two blocks are encoded into two separate memory tables. If a transition is not present in the secondary block table, then the primary block table acts as a default transition lookup for the current input character. P3FSM Primary I secondary P3FSM 2. Deriving State Codes : (1) Group all states in the SDFA that have the same next state into a group (2) Groups with the same character are combined into a cluster. (3) the number of clusters are reduced by merging all the clusters that do not have common states in the secondary block to form one cluster (4)Encoding the groups : a) Character Signature (cs) : it identifies the character required for transitions to a state b) State Signature (ss) : it identifies the next state (5)State code : a state code for each state is obtained by concatenating the group codes for the groups that a state is a member of P3FSM G1[S0][H] G2[S0][S] G3[S1][E] G4[S1 S5][I] G5[S2 S7 S9][H] G6[S3 S8 ][R] G7[S4][S] G8[S5][E] G9[S6][S] I P3FSM C1 C2 H S E R I P3FSM Falure index Operating Table : (1) Charater / Cluster Table (cc) : (2)Code Table (code): Sindex =Choffset + Ssig P3FSM P3FSM Memory Efficient : ┐ Equation 1 : STT = Q*「log2Q *28 Q is the total number of state of the DFA ┐ Equation 2 :P3FSM = Q*(L+「log2P ) L is the length of state code P is the number of patterns to be detected P3FSM GPEP ARCHITECTURE GPEP ARCHITECTURE Host : • The host creates and optimizes the DFA • The host transfers the resulting tables to the memory of the GPU • The host also maintains the current packet buffer which is mapped to the global memory of the GPU GPEP ARCHITECTURE Device : The memory tables necessary for the P3FSM kernel operation are stored in the local data store (LDS) of each compute unit, and the private memory of each stream core. GPEP ARCHITECTURE GPEP ARCHITECTURE