Issues in Benchmarking Intrusion Detection Systems Marcus J. Ranum <mjr@nfr.net> 1 IDS Benchmarking? • How hard can it be to benchmark intrusion detection systems? – Very! – There are lots of ways to get it wrong • Accidentally • Deliberately – Avoiding doing it wrong does not necessarily mean you’ve done it right 2 What’s an IDS? • IDS = Intrusion Detection System – Primary criterion for measurement is the IDS’ ability to detect intrusions – Secondary criteria for measurement are other issues: • False positives - false alarms • False negatives - real attacks that are missed • Performance impact - thruoughput delay or CPU usage on host processor 3 Types of IDS • Primary Types: – Network IDS (NIDS) – Host IDS (HIDS) • Hybrid Types: – Per-Host Network IDS (PH-NIDS) – Load Balanced Network IDS (LB-NIDS) – Firewall IDS (FW-IDS) 4 Properties of: Network IDS • Collect packets in promiscuous mode • Issues: – Packet collection rate - what is the maximum throughput? – Reassembly/defragmentation/reordering what about traffic spoofing? – Selective analysis - is the IDS choosing to ignore some traffic in order to optimize? 5 Properties of: Host IDS • Operate on host logs and processes – Sometimes forwards audit records to a central for analysis • Issues: – CPU usage on host – What about packet-oriented attacks? – Per-platform (individual) view of attacks single system is monitored per agent 6 Properties of: Per-Host Network IDS • Network IDS “shim” layer inserted into network stack on each host • Issues: – Has properties of a network IDS – But: • Traffic is processed per-host only • Does not have same performance as NIDS • “Local” only view of traffic (but no drops) 7 Properties of: Load-Balanced Network IDS • Use a load-balancing pre-processor to “spread” load across multiple NIDS • Issues: – Can scale to “infinite” bandwidth – Total cost of solution is not single unit pricing (requires switch + multiple NIDS) 8 Properties of: Firewall IDS • Place network IDS capability in a firewall or bridge type device • Issues: – No packet loss issues (retransmits take care of packets that are lost) – (May) slow down network throughput 9 Other Issues • Other things affecting speed and detection ability: – TCP fragment re-assembly – TCP packet re-ordering – TCP state/sequence tracking – Analyzing only selected sessions 10 Fragment Re-assembly • Re-assembling fragments takes significant CPU time as well as memory to buffer packets – IDS can be negatively impacted by faked fragments intended to consume extra memory – How does IDS handle fragmented attacks? Simply alert “I see fragmented traffic” or de-fragment then apply IDS logic? 11 Packet Re-ordering • Re-ordering packets requires significant CPU as well as memory for packet buffering – IDS can be impacted by unintentional or deliberate packet drops since it tries to buffer out-of-sequence packets – How does IDS handle re-ordering? Does it just flag out-of-sequence packets, or does it re-order then apply IDS logic? 12 TCP State Tracking • Tracking TCP states requires maintaining per-session information – IDS is impacted by number of simultaneous streams – IDS is impacted by randomized traffic – IDS is harder to fool with faked out-ofsequence FIN packets 13 Analyzing Selected Sessions • IDS can “optimize” performance by only reassembling or tracking TCP related with known signatures – IDS might have extremely good performance against random traffic but poor performance against (e.g.) Web traffic – Tradeoff is coverage versus performance; vendors do not usually document this 14 Naïve Simulation Network Target Host Test Network Attack Generator Attack Stream NIDS 15 What’s Wrong? • The Naïve test network permits traffic that is not likely to be seen in a “real world” deployment - e.g.: ARP cache poisoning (you see a lot of this on DEFCON CTF networks) • The presence of a router would “smooth” spikes somewhat and actually achieve higher sustained loads 16 Naïve Simulation Network #2 Test Network #1 Attack Generator Smartbits Load Generator Router w/some screening Target Host Test Network #2 Attack Stream NIDS 17 What’s Wrong? • SmartBits style traffic generators do not generate “real” TCP traffic – This penalizes IDS that actually look at streams and try to reassemble them (which are desirable properties of a good IDS) 18 Skunking a Benchmark Smartbits Load Generator Attack Generator Target Host w/Host-Net Target Host w/Host-Net Target Host w/Host-Net Test Network Attack Stream 19 What’s Wrong? • Packet style counts are not relevant to host-network IDS 20 Skunking a Benchmark: #2 Smartbits Load Generator Attack Generator Target Host Test Network Attack Stream NIDS with selective detection turned on 21 What’s Wrong? • IDS with selective detection can be configured to only look at traffic aimed to local subnet – SmartBits style generators’ random traffic largely gets seen and discarded 22 Effective Simulation Network Replayed packets dumped back onto network Test Network Recorded attack and normal traffic on hard disk NIDS 23 What’s Wrong? • Nothing: – Predictable baseline – Can verify traffic rate with simple math – Can scale load arbitrarily (use multiple machines each with different capture data) – Traffic is real including “real” data contents – NID cannot be configured to watch a specific machine (there are no targets) 24 Tools to Use • Fragrouter - generates fragmented packets • Whisker - generates out-of-sequence packets • Pcap-pace - replays packets from a hard disk with original inter-packet timing 25 Summary • It’s easy to skunk an intrusion detection benchmark • It’s hard to design a good intrusion detection benchmark • If you want to see if a given system works, the best way to find out is to try it on your actual network 26