A Dos Resilient Flow-level Intrusion Detection Approach for High-speed Networks Yan Gao, Zhichun Li, Yan Chen Department of EECS, Northwestern University Presented By Sudarsan Vinay Maddi Christopher Brandon Barkley Outline Motivation Background on Sketches Design of the HiFIND system Evaluation Conclusion The Problem The increasing frequency, severity, and sophistication of viruses makes it critical to detect outbursts at routers and gateways instead of end hosts. Current Intrusion Detection Systems Signature-based Detection Anomaly-based Detection Signature-based Intrustion Detection Examples: BRO, Snort Perform pattern-matching and report situations that match known attack types. Advantage: Accurately detects known attack types. Disadvantage: Attackers can modify or create attacks that avoid detection until a software update. Anomaly-based Intrusion Detection Example: Manhunt Build a model of acceptable behavior and flag exceptions using heuristics. Advantage: Model is built according to actual use and can detect previously unknown attacks. Disadvantage: Heuristic model can lead to false positives, system is inaccurate in the beginning (when it has little information). Existing Network IDSes Insufficient Signature based IDS cannot recognize unknown or polymorphic intrusions Statistical IDSes for rescue, but Flow-level detection: unscalable Vulnerable to DoS attacks e.g. TRW [IEEE SSP 04], TRW-AC [ USENIX Security Symposium 04], Superspreader [NDSS 05] for port scan detection Overall traffic based detection: inaccurate, high false positives e.g. Change Point Monitoring for flooding attack detection [IEEE Trans. on DSC 04] Existing Network IDSes Insufficient Key features missing Distinguish SYN flooding and various port scans for effective mitigation Aggregated detection over multiple vantage points Other Limitations Another limitation of existing IDSes is that they are implemented in software. Software-based data recording have trouble keeping up with link speeds of high-speed routers. To solve this data recording must be hardware implementable. HiFIND System The main goal is to develop an accurate Highspeed Flow-level Intrusion Detection (HiFIND) system Leverage the data streaming techniques: reversible sketches Select an optimal small set of metrics from TCP/IP headers for monitoring and detection Aggregate compact sketches from multiple routers for distributed detection Goals of HiFIND Scalable to flow-level detection on high speed networks DoS resilient Distinguish SYN flooding from port scans Enable aggregate detection over multiple gateways. Seperate anomalies to limit false positives. Deployment of HiFIND Attached to a router/switch as a black box Edge network detection particularly powerful HiFIND Inter net LA N LA N Switch scan port system Switch Inter net Splitter HiFIND system HiFIND system LA N Switch Router scan port Switch LAN (a) HiFIND system (b) Original configuration Splitter Router Switch LA N Inter net Monitor each port separately Router Switch LA N (c) Monitor aggregated traffic from all ports Outline Motivation Background on Sketches Design of the HiFIND system Evaluation Conclusion Reversible Sketches Traditional sketches do not store key information making it hard to infer a culprit flow. Reversible sketches use a reversible hashing function to infer keys of culprits without storing explicit key information. More info: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams by Schweller, Gupta, Parsons, and Chen of Northwestern University. Two Dimensional k-ary Sketch Instead of using one-dimensional hash table, use a 2D hash table matrix. Allows to distinguish between types of attacks by keeping track of more information. Ex. Columns are a hash of {SIP,DIP}, rows are a hash of Dport. Outline Motivation Background on Sketches Design of the HiFIND system Architecture Sketch-based intrusion detection Intrusion classification with 2D sketches Feature analysis Evaluation Conclusion Architecture of the HiFIND system R eversib le sketch es&2D sketch esfro m o th erro u ters R eco rd in g stag e R eal traffic stream R eversib le sketch& 2Dsketch S ketch reco rd in g D etectio n stag e A g g reg ated 2Dsketch R eco rd in g stag e A g g reg ated reversib le sketch T im e S eries A n alysis m eth o d s F o recast erro r sketch T h resh o ld b ased d etectio n In tru sio n classificatio n F alse p o sitive red u ctio n F o recast sketch P h ase1 P h ase2 P h ase3 A ttack m itig atio n D etectio n stag e Architecture of the HiFIND system Threat model TCP SYN flooding (DoS attack) Port scan Horizontal scan Vertical scan Block scan Forecast methods EWMA Sketch-based Detection Algorithm Keys SYN flooding Hscan Vscan Score {SIP, Dport} non-spoofed Yes No 1.5 {DIP, Dport} Yes No No 1 {SIP, DIP} non-spoofed No Yes 1.5 {SIP} non-spoofed Yes Yes 2.5 {DIP} Yes No Yes 2 {Dport} Yes Yes No 2 Sketch-based Detection Algorithm RS({DIP, Dport}, #SYN - #SYN/ACK) RS({SIP, DIP}, #SYN - #SYN/ACK) Detect SYN flooding attacks Detect any intruder trying to attack a particular IP address RS({SIP, Dport}, #SYN - #SYN/ACK) Detect any source IP which causes a large number of uncompleted connections to a particular destination port Intrusion Classification Major challenge Can not completely differentiate different types of attacks E.g., if destination port distribution unknown, it is hard to distinguish non-Spoofing SYN flooding attacks from vertical scans by RS({SIP, DIP}, #SYN - #SYN/ACK) Intrusion Classification Bi-modal distribution SYN floodings Vertical scans SYN floodings Vertical scans Two-dimensional (2D) Sketch For example: differentiate vertical scan from SYN flooding attack The two-dimensional k-ary sketches Ky rows hy(Dport) hy(Dport) hy(Dport) hx(SIP,DIP) Kx columns hx(SIP,DIP) H two-dimensional hash matrices An example of UPDATE operation hy(80) +1 Packet {2.3.0.5, 9.7.2.3, 80,SYN} hx(2.3.0.5,9.7.2.3) hx(SIP,DIP) DoS Resilience Analysis HiFIND system is resilient to various DoS attacks as follows Send source spoofed SYN packets to a fixed destination Send source spoofed packet to random destinations Detected as SYN flooding attack Evenly distributed in the buckets of each hash table, no false positives Reverse-engineer the hash functions to create collisions Difficult to reverse engineering of hash functions Unknown hash output of each hash function Multiple hash tables and different hash functions Even know the hash functions of sketches Very hard to find collisions through exhaustive search Distributed Intrusion Detection • Naive solution: Transport all the packet traces or connection states to the central site • HiFIND: Summarize the traffic with compact sketches at each edge router, and deliver them to the central site Outline Motivation Background on Sketches Design of the HiFIND system Evaluation Conclusion Evaluation Methodology Router traffic traces Lawrence Berkeley National Laboratory Northwestern University One-day trace with ~900M netflow records One day experiment in May 2005 with 239M netflow records, 1.8TB traffic and 1:1 packet samples Evaluation metrics Detection accuracy Online performance: Speed Memory consumption Memory access per packet Highly Accurate A g g r e g a t e d 2 D s k e t c h R e c o r d i n g s t a g e A g g r e g a t e d r e v e r s i b l e s k e t c h T i m e S e r i e s A n a l y s i s m e t h o d s F o r e c a s t e r r o r s k e t c h T h r e s h o l d b a s e d d e t e c t i o n I n t r u s i o n c l a s s i f i c a t i o n F a l s e p o s i t i v e r e d u c t i o n F o r e c a s t s k e t c h P h a s e 1 P h a s e 2 P h a s e 3 A t t a c k m i t i g a t i o n Detection Validation SYN flooding Backscatter Hscans and Vscans The knowledge of port number e.g. 5 major scenarios of the top 10 Hscans Anonymized SIP Dport # DIP Cause 204.10.110.38 1433 56275 SQLSnake scan 5.4.247.103 1433 54788 SQLSnake scan 109.132.101.1 99 22 45014 Scan SSH 95.30.62.202 3306 25964 MySQL Bot scans 15.192.50.153 4899 23687 Rahack worm Detection Validation e.g. 5 major scenarios of the bottom 10 Hscans Anonymized SIP Dport # DIP Cause 98.198.251.16 8 135 64 Nachi or MSBlast worm 3.66.52.227 445 64 Sasser and Korgo worm 2.0.28.90 139 64 NetBIOS scan 98.198.0.101 135 64 Nachi or MSBlast worm 165.5.42.10 5554 62 Sasser worm Online performance evaluation Small memory access per packet 16 memory accesses per packet with parallel recording Small memory consumption Online performance evaluation Recording speed Worst case: recording 239M items in 20.6 seconds i.e., 11M insertions/sec Detection speed Detection on 1430 minute intervals Average detection time: 0.34 seconds Maximum detection time: 12.91 seconds Stress experiments in each hour interval Detecting top 100 anomalies with average 35.61 seconds and maximum 46.90 seconds Outline Motivation Background on Sketches Design of the HiFIND system Evaluation Conclusion Conclusion - Advantages Achieves proposed goals including scalability and distinguishing attack types. Highly accurate on test data. Reduction in False Positives Very low memory usage (13.2 MB) Conclusion - Disadvantages HiFIND did not detect some small horizontal port scans that TRW detected. Authors said these were a combination of multiple small scans too stealthy for their thresholds Future work to further investigate this and find a way to account for it. Conclusion – Paper Disadvantages Authors vague on implementation, only mentioning it used a single FPGA board. Authors not explicitly define terms (e.g. Sketches). Authors do not explain or cite heuristics used to reduce false positives. Thank You ! Questions?