Towards High Speed Network Defense Zhichun Li EECS Deparment Northwestern University Agenda • Briefly introduce my thesis work • Dive in high performance vulnerability signature matching • Future research directions 2 Motivation Attackers Botnets Professional attackers exploit the enterprise networks for profit $$$ Worms 3 Network Level Defense • Network gateways/routers are the vantage points for detecting large scale attacks • Only host based detection/prevention is not enough for modern enterprise networks – Some users do not apply the host-based schemes due to the reliability, overhead, and conflicts. – Many users do not update or patch their system on time. – Enterprises cannot only reply on their end users for security protection 4 Challenges • Scalable to high speed networks with a large number of users • Need to be highly accurate • Adapt fast to the emerging threats • Have good attack coverage. 5 Network-based Intrusion Detection, Prevention, and Forensics System • Framework Scalability Accuracy & Scalability & Coverage Packet streams (I) Sketch based monitoring & detection Accuracy & adapt fast (III) Signature matching engines (II) Polymorphic worm signature generation Honynet honeyfarms (IV) Network Situational Awareness Accuracy & 6 adapt fast Network-based Intrusion Detection, Prevention, and Forensics System (I) • Online traffic monitoring and recording [INFOCOM 2006, ToN 2007] (cited by 30+) – – – – Reversible sketch for data streaming computation Record millions of flows (GB traffic) in a few hundred KB Small # of memory access per packet Scalable to large key space size (232 or 264) • Online sketch-based flow-level anomaly detection [IEEE ICDCS 2006] [IEEE CG&A, Security Visualization 2006] – Detect TCP SYN flooding, horizontal and vertical scans even when mixed h (k) 0 1 … 1 K-1 1 … … j H hj(k) hH(k) 7 Network-based Intrusion Detection, Prevention, and Forensics System (II) • Polymorphic worm signature generation – Token based Signature [IEEE Symposium on Security and Privacy 2006] (cited by 40+, code requested by Columbia U. UT Austin, Purdue, Georgia Tech, UC Davis, etc) – Network based Vulnerability Signature [IEEE ICNP 2007] [ NSF Cyber Trust Award] 1010101 Internet Network gateway 10111101 11111100 Our network 00010111 8 Network-based Intrusion Detection, Prevention, and Forensics System (III) • NetShield Vulnerability Signature based NIDS/NIPS [under submission] [NSF Cyber Trust Award] (interested by Cisco and Juniper) Focus of this talk, details come later 9 Network-based Intrusion Detection, Prevention, and Forensics System (IV) • Large-scale botnet and P2P misconfiguration event situational-aware forensics – Botnet attack target/strategy inference [ASIACCS09] – Root cause analysis of the P2P misconfiguration/poisoning traffic [under submission] Peers File Request Flooding Innocent Victim Misconfigured Traffic DDoS attack Scenario 10 NetShied: Matching a Large vulnerability Signature Ruleset for High Performance Network Defense 11 NetShield Overview NIDS/NIPS (Network Intrusion Detection/Prevention System) operation Signature DB Packets NIDS/NIPS ` ` ` Security • Accuracy alerts • Speed • Attack Coverage 12 State of the art Regular expression (regex) based approaches Example: .*Abc.*\x90+de[^\r\n]{30} Pros Cons • Can efficiently match multiple sigs simultaneously, through DFA • Can describe the syntactic context • Limited expressive power • Cannot describe the semantic context • Inaccurate 13 State of the art Vulnerability Signature [Wang et al. 04] Example: BIND: rpc_vers==5 && rpc_vers_minor==1 && packed_drep==\x10\x00\x00\x00 && context[0].abstract_syntax.uuid=UUID_RemoteActivation BIND-ACK: rpc_vers==5 && rpc_vers_minor==1 CALL: rpc_vers==5 && rpc_vers_minors==1 && packed_drep==\x10\x00\x00\x00 && stub.RemoteActivationBody.actual_length>=40 && matchRE( stub.buffer, /^\x5c\x00\x5c\x00/) Pros Cons • Directly describe semantic context • Very expressive, can express the vulnerability condition exactly • Accurate • Slow! • Existing approaches all use sequential matching • Require protocol parsing 14 Speed High Motivation of NetShield State of the art regex Sig IDSes NetShield Theoretical accuracy limitation of regex Low Existing Vulnerability Sig IDS Low Accuracy High 15 Motivation • Desired Features for Signature-based NIDS/NIPS – Accuracy (especially for IPS) – Speed Cannot capture vulnerability – Coverage: Large ruleset condition well! Regular Expression Vulnerability Accuracy Relative Poor Much Better Speed Good ?? Memory OK ?? Coverage Good ?? Shield [sigcomm’04] Focus of this work 16 Research Challenges • Background – Use protocol semantics to express the vulnerability – Defined on a sequence of PDUs & one predicate for each PDU – Example: ver==1 && method==“put” && len(buf)>300 • Challenges – Matching thousands of vulnerability signatures simultaneously • Sequential matching match multiple sigs simultaneously – High speed parsing 17 Outline • • • • • Motivation High Speed Matching for Large Rulesets. High Speed Parsing Evaluation Research Contributions 18 A Vulnerability Signature Example • Data representations – For all the vulnerability signatures we studied, we only need numbers and strings – number operators: ==, >, <, >=, <= – String operators: ==, match_re(.,.), len(.). • Example signature for Blaster worm Example: BIND: rpc_vers==5 && rpc_vers_minor==1 && packed_drep==\x10\x00\x00\x00 && context[0].abstract_syntax.uuid=UUID_RemoteActivation BIND-ACK: rpc_vers==5 && rpc_vers_minor==1 CALL: rpc_vers==5 && rpc_vers_minors==1 && packed_drep==\x10\x00\x00\x00 && stub.RemoteActivationBody.actual_length>=40 && matchRE( stub.buffer, /^\x5c\x00\x5c\x00/) 19 Matching Problem Formulation • Consider single PDU matching first • Suppose we have n signatures, defined on k matching dimensions (matchers) – A matcher is a two-tuple (field, operation) or a four-tuple for the associative array elements. – Translate the n signatures to a n by k table. Rule 6: URI.Filename=“fp40reg.dll” && len(Headers[“host”])>300 20 Matching Problem Formulation • Challenges for Single PDU matching problem (SPM) – Large number of signatures n – Large number of matchers k – Large number of “don’t cares” – Cannot reorder matchers arbitrarily -- buffering constraint – Field dependency • Arrays, associative arrays • Mutually exclusive fields. 21 Matching Algorithms Candidate Selection Algorithm 1.Pre-computation decides the rule order and matcher order 2.Divide-and-conquer comparison w/ matchers and iteratively combine the results efficiently 22 Step 1: Pre-Computation • Matcher reoder: Put the non-selective matchers later based on buffering constraint & field arrival order RB1 • Rule reorder: Don’t care of Matcher 1 Matcher 1 RB1 RB2 Extended by Don’t care of both Matcher 2 Matcher 1 & 2 . .. RB1 RB2 RB3 RB4 ... Don’t care of all Matcher 1 to n 23 Step 2: Iterative Matching PDU={Method=POST, Filename=fp40reg.dll, VARs: name="file"; value~".*\.\./.*", Headers: name="host"; len(value)=450} RB1: 1 2 3 S1= {3} RB1: 1 2 3 RB2: 4 5 6 S2 = S1 A2+B2 = {3} {}+{6} = {}+{6} = {6} RB1: 1 2 3 RB2: 4 5 6 RB3: 7 S3 = S2 A3+B3 = {6} {}+{} = {6}+{} = {6} RB1: 1 2 3 RB2: 4 5 6 RB3: 7 RB4: 8 S4 = S3 A4+B4 = {6} {4}+{} = {6}+{} = {6} RB1: 1 2 3 RB2: 4 5 6 RB3: 7 RB4: 8 RB5: 9 S5 = S4 A5+B5 = {6} {6}+{} = {6}+{} = {6} 24 Candidate merge operation Si Ai 1 Don’t care matcher i+1 Si Si Ai 1 require matcher i+1 In Ai+1 25 Refinement and Extension • SPM improvement – Allow negative conditions – Handle array case – Handle associate array case – Handle mutual exclusive case – Report the matched rules as early as possible • Extend to Multiple PDU Matching (MPM) – Allow checkpoints. 26 Outline • • • • • Motivation High Speed Matching for Large Rulesets. High Speed Parsing Evaluation Research Contribution 27 Observations • PDU parse tree PDU • Leaf nodes are integers or strings array • Vulnerability signatures mostly based on leaf nodes • Observation 1: Only need to parse the fields related to signatures. • Observation 2: Traditional recursive descent parsers which need one function 28 call per node are too expensive. Efficient Parsing with State Machines • Studied eight protocols: HTTP, FTP, SMTP, eMule, BitTorrent, WINRPC, SNMP and DNS as well as their vulnerability signatures. • Pre-construct parsing state machines based on parse trees and vulnerability signatures. • Common relationship among leaf nodes. Var Var derive Var Sequential Branch Loop Derive (a) (b) (c) (d) 29 Example for WINRPC • Rectangles are states • Parsing variables: R0 .. R4 • 0.61 instruction/byte for BIND PDU R1-16 8 merge2 1 ncontext 3 padding Bind-ACK 1 rpc_vers 1 rpc_ver_minor R0 1 ptype Header 1 pfc_flags R0 4 packed_drep Bind R1 2 frag_length 6 merge1 merge3 R4 20*R4 2 ID 1 n_tran_syn 1 padding 16 UUID 4 UUID_ver tran_syn Bind-ACK R2 ‹- 0 R3 ‹- ncontext Bind R2++ R2£R3 30 Outline • • • • • Motivation High Speed Matching for Large Rulesets. High Speed Parsing Evaluation Research Contributions 31 Evaluation Methodology Fully implemented prototype • 11,704 lines of C++ and 2,706 lines of Python • Can run on both Linux and Windows Deployed at a university DC with up to 106Mbps • 26GB+ Traces from Tsinghua Univ. (TH), Northwestern (NU) and DARPA • Run on a P4 3.8Ghz single core PC w/ 4GB memory. • After TCP reassembly and preload the PDUs in memory • For HTTP we have 794 vulnerability signatures which covers 973 Snort rules. • For WINRPC we have 45 vulnerability signatures which 32 covers 3,519 Snort rules Parsing Results TH DNS TH NU TH WINRPC WINRPC HTTP 0.31 3.43 1.41 16.2 1.11 12.9 2.10 14.2 1.69 7.46 44.4 6.67 11.2 Max. memory per 15 11.5 15 11.6 15 3.6 14 Trace Throughput (Gbps) Binpac Our parser Speed up ratio NU HTTP 3.1 14 DARPA HTTP 3.9 14 connection (bytes) 33 Matching Results Trace Throughput (Gbps) Sequential CS Matching Matching only time speed up ratio TH NU TH WINRPC WINRPC HTTP NU HTTP 10.68 14.37 9.23 10.61 0.34 2.63 2.37 0.28 17.63 1.85 4 1.8 11.3 11.7 1.48 27 0.033 0.038 0.0023 20 20 20 Avg # of Candidates 1.16 Max. memory per connection (bytes) 27 DARPA HTTP 8.8 34 Other Results Rule scaling results Throughput (Gbps) 0 1 2 3 4 Performanc Decrease gracefully Compare with Regex • Memory for 973 Snort rules: DFA 5.29GB (XFA 863 rules1.08MB), NetShield 2.3MB • Per flow memory: XFA 36 bytes, NetShield 20 bytes. • Throughput: XFA 756Mbps, NetShield 1.9+Gbps *XFA [SIGCOMM08][Oakland08] 0 200 400 600 # of rules used 800 35 Research Contributions • Demonstrate vulnerability signatures can be applied to NIDS/NIPS, which can significantly improve the accuracy of current NIDS/NIPS • Propose the candidate selection algorithm for matching a large number of vulnerability signatures efficiently • Propose parsing state machine for fast protocol parsing 36 • Implement the NetShield Future work • Working in process – In collaboration with MSR. Apply the semantic rich analysis for cloud Web service profiling. To understand why slow and how to improve. • Future work – Web security (browser security, web server security) – Data Center security – High Speed Network Intrusion Prevention System with Hardware Support 37 Long Term Research Challenges • Combat the professional profit-driven attackers. • Online applications (including Web 2.0 applications) become more complex and vulnerable. • Network speed keeps increasing, which demands highly scalable approaches. 38 Q&A Thanks! 39 • Backup Slides 40 Measure Snort Rules • Semi-manually classify the rules. 1. Group by CVE-ID 2. Manually look at each vulnerability • Results – 86.7% of rules can be improved by protocol semantic vulnerability signatures. – Most of remaining rules (9.9%) are web DHTML and scripts related which are not suitable for signature based approach. – On average 4.5 Snort rules are reduced to one vulnerability signature. – For binary protocol the reduction ratio is much higher than that of text based ones. • For netbios.rules the ratio is 67.6. 41 Motivation • Network security has been recognized as the single most important attribute of their networks, according to survey to 395 senior executives conducted by AT&T • Many new emerging threats make the situation even worse 42 System Framework Accuracy & Scalability & Coverage Sent out for aggregation Reversible k-ary sketch monitoring Local sketch records Remote aggregated sketch records Sketch based statistical anomaly detection (SSAD) Part III Streaming packet data Signature matching Content-based engines signature matching Token Based Signature Generation (TOSG) Protocol semantic signature matching To unused IP blocks Data path Length Based Signature Generation (LESG) Network Situational Awareness Honeynets/ Honeyfarms Control path Modules on the critical path Modules on the non-critical path Scalability Part I Sketchbased monitoring & detection Accuracy & adapt fast Part II Polymorphic worm signature generation Part IV Network Situational Awareness Accuracy & adapt43fast Example of Vulnerability Signatures • At least 75% vulnerabilities are due to buffer overflow Sample vulnerability signature • Field length corresponding to vulnerable buffer > certain threshold • Intrinsic to buffer overflow vulnerability and hard to evade Overflow! Protocol message Vulnerable buffer 44 Old Slides 45 Conclusions • A novel network-based vulnerability signature matching engine – Through measurement study on Snort ruleset, prove the vulnerability signature can improve most of the signatures in NIDS/IPS. – Proposed parsing state machine for fast parsing – Propose a candidate selection algorithm for matching a large number of vulnerability signature simultaneously 46 Outline • Motivation • Feasibility Study: a measurement approach • Problem Statement • High Speed Parsing • High Speed Matching for massive vulnerability Signatures. • Evaluation • Conclusions 48 Outline • Motivation • Feasibility Study: a measurement approach • Problem Statement • High Speed Parsing • High Speed Matching for massive vulnerability Signatures. • Evaluation • Conclusions 49 Outline • Motivation • Feasibility Study: a measurement approach • Problem Statement • High Speed Parsing • High Speed Matching for a large number of vulnerability Signatures. • Evaluation • Conclusions 50 Outline • Motivation • Feasibility Study: a measurement approach • Problem Statement • High Speed Parsing • High Speed Matching for massive vulnerability Signatures. • Evaluation • Conclusions 51 Limitations of Regular Expression Signatures Signature: 10.*01 1010101 10111101 Internet Traffic Filtering X X 11111100 Our network 00010111 Polymorphism! Polymorphic attack (worm/botnet) might not have exact regular expression based signature 52 What we do? • Build a NIDS/NIPS with much better accuracy and similar speed comparing with Regular Expression based approaches – Feasibility: Snort ruleset (6,735 signatures) 86.7% can be improved by vulnerability signatures. – High speed Parsing: 2.7~12 Gbps – High speed Matching: • Efficient Algorithm for matching massive vulnerability rules • HTTP, 791 vulnerability signatures at ~1Gbps 53 Problem Formulation • Parsing problem formulation – Given a PDU and the protocol specification as input, output the set of fields which required by matching. 54 Publications • • • • • • Zhichun Li, Lanjia Wang, Yan Chen and Zhi (Judy) Fu, Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorohic Worms, in the Proc. of IEEE ICNP 2007. Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Elliot Parons, Yin Zhang, Peter Dinda, Ming-Yang Kao, and Gokhan Memik, Reversible sketches: Enabling monitoring and analysis over high speed data streams, in the IEEE/ACM Transaction on Networking, Volume 15, Issue 5, Oct, 2007 Zhichun Li, Manan Sanghi, Brian Chavez, Yan Chen and Ming-Yang Kao, Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience, in Proc. of IEEE Symposium on Security and Privacy, 2006 Zhichun Li, Yan Chen and Aaron Beach, Towards Scalable and Robust Distributed Intrusion Alert Fusion with Good Load Balacing, in Proc. of ACM SIGCOMM LSAD 2006 Yan Gao, Zhichun Li and Yan Chen, A DoS Resilient Flow-level Intrusion Detection Approach for High-speed Networks, In Proc. Of IEEE ICDCS 2006 Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Elliot Parons, Yin Zhang, Peter Dinda, Ming-Yang Kao, and Gokhan Memik, Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluations, and Applications, in the Proc. Of IEEE INFOCOM 2006 55 Current Status • Part I: Sketch based monitoring & detection – Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Elliot Parons, Yin Zhang, Peter Dinda, Ming-Yang Kao, and Gokhan Memik, Reversible sketches: Enabling monitoring and analysis over high speed data streams, in the IEEE/ACM Transaction on Networking, Volume 15, Issue 5, Oct, 2007 – Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Elliot Parons, Yin Zhang, Peter Dinda, Ming-Yang Kao, and Gokhan Memik, Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluations, and Applications, in the Proc. Of IEEE INFOCOM 2006 (252/1400=18%) – Yan Gao, Zhichun Li and Yan Chen, A DoS Resilient Flow-level Intrusion Detection Approach for High-speed Networks, In Proc. Of IEEE International Conference on Distributed Computing Systems (ICDCS) 2006 (75/536=14%) (Alphabetical order) • Part II: Polymorphic worm signature generation – TOSG: Zhichun Li, Manan Sanghi, Brian Chavez, Yan Chen and Ming-Yang Kao, Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience, in Proc. of IEEE Symposium on Security and Privacy, 2006 (23/251=9%) – LESG: Zhichun Li, Lanjia Wang, Yan Chen and Zhi (Judy) Fu, Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorohic Worms, in the Proc. of IEEE International Conference on Network Protocols (ICNP) 2007 (32/220=14%) 56 Current Status • Part III: Signature matching engines – Work in progress, will be focus of this talk – Zhichun Li, Gao Xia, Yi Tang, Jian Chen, Ying He, Yan Chen and Bin Liu, NetShield : Towards High Performance Networkbased Semantic Signature Matching, in submission • Part IV: Network Situational Awareness – Work in process – Zhichun Li, Anup Goyal, Yan Chen and Vern Paxson, Towards Situational Awareness of Large-Scale Botnet Events using Honeynets, in preparation – Zhichun Li, Anup Goyal, Yan Chen and Aleksandar Kuzmanovic, P2P Doctor: Measurement and Diagnosis of Misconfigured Peer-to-Peer Traffic, in submission 57 Current Status • Part I: Sketch based monitoring & detection – Result in [Infocom06,ToN,ICDCS06] • Part II: Polymorphic worm signature generation – Result in [Oakland06,ICNP07] • Part III: Signature matching engines – Work in progress, will be focus of this talk • Part IV: Network Situational Awareness – Work in process 58 Limitations of Exploit Based Signature Signature: 10.*01 1010101 10111101 Internet Traffic Filtering X X 11111100 Our network 00010111 Polymorphism! Polymorphic worm might not have exact exploit based signature 59 Vulnerability Signature Internet Vulnerability signature traffic filtering X X Our network X X Vulnerability Work for polymorphic worms Work for all the worms which target the same vulnerability 60