Boundary Detection in Tokenizing Network Application Payload for Anomaly Detection Rachna Vargiya and Philip Chan Department of Computer Sciences Florida Institute of Technology Motivation Existing anomaly detection techniques rely on information derived only from the packet headers More sophisticated attacks involve the application payload Example : Code Red II worm GET /default.ida?NNNNNNNNN… Parsing the payload is required! Problems in hand-coded parsing: Large number of application protocols Frequent introduction of new protocols Problem Statement To parse application payload into tokens without explicit knowledge of the application protocols These tokens are later used as features for anomaly detection Related work Pattern Detection - Important Tokens Fixed Length: Forrest et al. (1998) Variable Length: Wespi et al. (2000) Jiang et al.(2002) Boundary Detection – All Tokens VOTING EXPERTS by Cohen et al. (2002) Boundary Entropy Frequency Binary Votes Approach Boundary Finding Algorithms: Boundary Entropy Frequency Augmented Expected Mutual Information Minimum Description Length Approach is domain independent (no prior domain knowledge) Combining Boundary Finding Algorithms Combination of all or a subset (E.g. Frequency + Minimum Description Length) of techniques Each algorithm can cast multiple votes, depending on confidence measure Boundary Entropy (Cohen et al) Entropy at the end of each possible window is calculated High Entropy means more variation w X Itisarainyday P( x | w) log P( x | w) ‘x’ is the byte following the current window Voting using Boundary Entropy change graph to discrete bars Itisarainyday Entropy in meaningful tokens starts with a high value, drops, and peaks at the end Vote for positions with the peak entropy Threshold suppresses votes for low entropy values Threshold = Average BE Frequency (Cohen et al) Most frequent set of tokens are assumed to be meaningful tokens Frequencies of tokens with length =1, 2, 3…., 6 Shorter tokens are inherently more frequent than longer tokens Normalize frequencies for tokens of the same length using standard deviation Boundaries are assigned at the end of most frequent token in the window Itis arainyday Frequency in window: (1)”I” = 3 (2)”It” = 5 (3) “Iti” = 2 (4)”It is” = 3 Mutual Information (MI) Mutual Information given by: MI (a, b) lg[ P(a, b) /( P(a ) P (b))] Gives us the reduction of uncertainty in presence of event ‘b’ given event ‘a’ MI does not incorporate the counter evidence when ‘a’ occurs without ‘b’ and vice versa Augmented Expected Mutual Information (AEMI) AEMI ( A, B) P(a, b) MI (a, b) P( a, b)MI(a, b) P(a, b)MI(a,b) •AEMI sums the supporting evidence and subtracts the counter evidence •For each window, the location with the minimum AEMI value suggests a boundary Itisarainyday a b Minimum Description Length (MDL) Shorter code assigned to frequent tokens to minimize the overall coding length Boundary yielding shortest coding length is assigned votes Coding Length per byte: Lg P(ti): no of bits to encode ti |ti|=length of ti MDL lg P(t ) / | t i{left , right} i i | Itisarainyday tleft tright Normalize scores of each algorithm Each algorithm produces list of scores Since the number of votes is proportional to the score, the scores must be normalized Each score is replaced by the number of standard deviations that the score is away from the mean value Normalize votes of each algorithm Algorithms produce list of votes depending on the scores Make sure each algorithm votes with the same weight. Number of votes is replaced by the number of standard deviations from the mean value Normalizing Scores and Votes I t I s s1 s2 s3 I t I s s4 Scores s1 s2 s3 s4 Normalized ns1 ns2 ns3 ns4 scores ns1 ns2 ns3 ns4 v1 v2 v3 v4 nv1 nv1 nv1 nv1 Votes v1 v2 v3 v4 nv1 nv1 nv1 nv1 Combined Normalized Votes Combined Approach with Weighted Voting A list of votes from all the experts is gathered For each boundary, the final votes are summed A boundary is placed at a position if the votes at the position exceed threshold. Threshold = Average number of Votes Evaluation Criteria Evaluation A: % of space separated words retrieved Evaluation B: % of keywords in the protocol specification that were retrieved Evaluation C: entropy of the tokens in output file (lower the better) Evaluation D: number of detected attacks in network traffic A and B only for text based protocols Anomaly Detection Algorithm – LERAD (Mahoney and Chan) LERAD forms rules based on 23 attributes First 15 attributes: from packet header Next 8 attributes: from the payload Example Rule: If port = 80 then word1 = “GET” Original Payload attributes: space separated tokens Our Payload attributes: Boundary separated tokens Experimental Data 1999 DARPA Intrusion Detection Evaluation Data Set Week 3 :attack free (training) data Weeks 4, 5: attack containing (test) data Evaluations A, B, C (Known boundaries) : Week 3 trained: days 1 - 4 tested: days 5 – 7 Prevent gaining knowledge from Weeks 4 and 5 Evaluation D (Detected attacks) Trained: Week 3 Tested :Weeks 4 and 5 Evaluation A: % of Space-Separated Tokens Recovered Method Port# 25 Freq+MDL 52 Frequency 15 BE + AEMI + 21 MDL+ Freq AEMI 5 MDL 6 BE 3 Port# 80 26 16 14 Port# 21 21 13 5 Port# 79 81 99 12 Avg 9 7 3 4 3 1 32 25 9 12.5 10.3 4.0 45.0 36.0 13.0 Evaluation B: % of Keywords in RFCs Recovered Method Port#25 Port#80 Port#21 Avg Freq+MDL Frequency BE+AEMI+ MDL+Freq AEMI MDL BE 40 31 12 36 28 13 59 40 21 45.0 33.0 15.3 9 7 3 5 6 2 2 1 2 5.3 4.7 2.3 Evaluation C: Entropy of Output (Lower is Better) average across 6 ports Method Average Value Frequency MDL Freq+MDL BE BE + AEMI + Freq + MDL AEMI 5.0 5.03 5.06 5.25 5.56 6.38 Ranking of Algorithms Method Evaluation A Evaluation B Evaluation C Freq+MDL 1 1 3 Frequency 2 2 1 BE+AEMI+ MDL+ Freq AEMI 3 3 5 4 4 6 MDL 5 5 2 BE 6 6 4 Detection Rate for Space Separated Vs Boundary Separated (Freq + MDL) Port # 10 FP/day Space Boundary 100 FP/day Space Boundary 20 2 2 4 5 21 14 16 14 17 22 3 3 3 3 23 13 14 13 14 25 15 16 16 16 79 3 3 3 3 80 10 10 11 13 113 2 2 2 2 Overall 59 62 63 68 % Improvement -- 5 -- 8 Summary of Contributions Used payload information, while most IDS concentrate on header information. Proposed AEMI + MDL for boundary detection Combined all and subset of algorithms Used weighted voting to indicate confidence Proposed techniques find boundaries better than spaces Achieved higher detection rates in an anomaly detection system Future Work Further evaluation on other ports Pick more useful tokens instead of first 8 DARPA data set is partially synthetic, further evaluation on real traffic Evaluation with other Anomaly detection algorithms Thank you Experimental Results Table 4.3.4 Results from Additional Ports for Freq + MDL and ALL Method 23 115 515 Evaluation A Evaluation B % Words Found % Keywords Found Frq+ ALL MDL 13 7 43 20 38 14 Evaluation Entropy Frq+ ALL Frq+ ALL MDL MDL 5 3 7.88 8.08 4.45 5.18 7.66 7.27