Pegasus: Precision Hunting for Icebergs and Anomalies in Network Flows Sriharsha Gangam1, Puneet Sharma2, Sonia Fahmy1 1Purdue University, 2HP Labs This research has been sponsored in part by GENI project 1723, and by Hewlett-Packard 1 Passive Flow Monitoring Monitoring Data Collection & Analysis Network Devices e.g., switches [InMon] http://inmon.com • Detect network congestion, attacks, faults, anomalies, traffic engineering and accounting • Observe and collect traffic summaries • e.g., InMon traffic sentinel [InMon] uses sFlow, Cisco’s NetFlow is used in ISPs 2 Passive Flow Monitoring Challenges • Large overhead to collect and analyze fine-grained flow data • Increasing link speeds, network size and traffic o Limited CPU, memory resources at the routers o Millions of flows in ISP networks • Current Techniques? o o o o NetFlow sampling rate in ISPs ~ 1 in 100 (Internet2) sFlow packet sampling rate ~ 1 in 2000 Application dependent sketches Fine-grained information is lost 3 Will More Resources Help? • Commercial co-located compute and storage o HP ONE Blades o Cisco SRE Modules • Example configuration o 2.20 GHz Core Duo processor o 4 GB RAM, 250 GB HD o 2x10 Gbps duplex bandwidth to switch • Storage and Analysis of fine-grained flow statistics o Distributed monitoring applications 4 Design Space Our Goals: Pegasus - Accurate & low overhead monitoring Additional Compute & Storage Naïve Solution Ideal Solution Impractical Current Solutions: Sampling and Sketching Network Overhead 5 Key Class of Applications • Network bottlenecks o Top traffic destinations, sources, and links • Suspicious port scanning activity o Sources that connect to more than 10% hosts within time T • DDoS attack detection o Destinations with large number of connections or traffic 6 Global Iceberg Detection h1 h2 h4 … 20 10 50 … Monitoring Data o Global heavy hitters h2 h3 h4 … 60 15 20 … Items contributing > 1% (θ) traffic? h3 h5 h6 … 50 10 30 … • Items with aggregate count exceeding a threshold (S xθ) Network Devices e.g., switches • Observations at any single switch/router may not be significant or interesting o E.g., DDoS attack 7 Online Iceberg Detection with Pegasus • Reduce communication overhead o Additional compute and storage Iterative solution • Precisely detect all global icebergs High precision o zero false positives and false negatives • Feedback based iterative approach 8 Comparison of Different Approaches Naïve Sampling and Pegasus Approach Sketching i1 i2 i4 Prohibitively Lossy Summary: large Lossy Summary: SketchFalse Monitoring +ves andData -ves sets … 20 10 50 … i2 i3 i4 … 60 15 20 … Monitor i3 i5 i6 … 50 10 30 … Network Devices e.g., switches Collection & Analysis (Aggregator) 1- D Sketch-set Representation • Sketch-set: Summary representation of a collection of flows, supports set operations Coarse Sketch-set Generation (Destination IP, Packet Count) 128.41.10.10, 15 128.41.10.20, 20 128.41.10.30, 15 128.41.10.40, 30 128.41.10.50, 25 128.41.10.110, 110 128.41.10.150, 100 128.41.10.210, 300 α (startIP, endIP, minPkt, maxPkt) 128.41.10.10, 128.41.10.50, 15, 30 128.41.10.110, 128.41.10.150, 100, 110 128.41.10.210, 128.41.10.210, 300, 300 β Coarse-grained sketchsets Example: Destinations IPs receiving more than 200 packets 10 Example Coarse-grained Sketch-sets INTERSECTION SUBTRACTION (startIP, endIP, minPkt, maxPkt) 128.41.10.10, 128.41.10.50, 15, 30 128.41.10.110, 128.41.10.150, 100, 110 128.41.10.210, 128.41.10.210, 300, 300 Monitor 1 Monitor 2 128.41.10.35, 128.41.10.70, 10, 35 128.41.10.100, 128.41.10.120, 90, 130 Disjoint Sketch-sets Aggregator Nonicebergs 128.41.10.10, 128.41.10.34, 15, 30 128.41.10.35, 128.41.10.50, 10, 65 128.41.10.51 , 128.41.10.70, 10, 35 128.41.10.100, 128.41.10.109, 90, 130 128.41.10.121, 128.41.120.150, 100, 110 128.41.10.110, 128.41.10.120, 90, 240 128.41.10.210, 128.41.10.210, 300, 300 Query monitors (uncertain) Iceberg 11 Example…Query Response Aggregator Query: (128.41.10.110, 128.41.10.120) 128.41.10.110, 110 Monitor 1 Lookup relevant flows 128.41.10.110, 128.41.10.110, 110, 110 Generate Sketch-sets (finer granularity) Monitor 2 128.41.10.110, 90 128.41.10.120, 130 128.41.10.110, 128.41.10.110, 90, 90 128.41.10.120, 128.41.10.120, 130, 130 12 Example…Query Response Aggregator Query: (128.41.10.110, 128.41.10.120) Fine-grained sketch-sets 128.41.10.110, 128.41.10.110, 110, 110 Monitor 1 Monitor 2 128.41.10.110, 128.41.10.110, 90, 90 128.41.10.120, 128.41.10.120, 130, 130 Iceberg Aggregator 128.41.10.110, 128.41.10.110, 200, 200 128.41.10.120, 128.41.10.120, 130, 130 Nonicebergs 13 Evaluation Methodology • Abilene trace o Netflow records: 11 sites with 1 in 100 sampling for 5 min o Add small flows to revert sampling • (90% of flows contribute to 20% of traffic, ~ 758K unique flow records) o Trace is used in [Huang11] • Enterprise network sFlow trace o sFlow records: 249 switches,1 in 2000 sampling for a week o Revert sampling by adding flows • PlanetLab’s Outgoing Traffic o NetFlow records generated at each PlanetLab host [Huang11] G. Huang, A. Lall, C. Chuah, and J. Xu. Uncovering global icebergs in distributed 14 streams: Results and implications. J. Netw. Syst. Manage., 19:84–110, March 2011 Comparison with Samplesketch • Sends sampled monitoring data and sketches to the aggregator for iceberg detection • Uses two main parameters o Sampling interval o Sketch threshold • Difficult to decide the parameters • Can have false positives and false negatives G. Huang, A. Lall, C. Chuah, and J. Xu. Uncovering global icebergs in distributed streams: 15 Results and implications. J. Netw. Syst. Manage., 19:84–110, March 2011 Abilene Trace Larger is better θ For the 5 min trace, θ = 0.08 - Naive solution: ≈ 7.63 MB - Pegasus: ≈ 8 KB - Sample-Sketch: ≈ 36 KB Pegasus has lower communication overhead θ 16 Monitoring Outgoing PlanetLab Traffic Monitor PlanetLab nodes Aggregator NetFlow records generated from outgoing traffic • Example of end-host monitoring system • Detect accidental attacks and anomalies originating from PlanetLab • Existing monitoring service: PlanetFlow o Decouples collection from analysis o Collects 1 TB of data every month [PF] (naïve approach) Monitor [PF] http://www.cs.princeton.edu/~sapanb/planetflow2/ 17 Pegasus PlanetLab Service • PlanetLab’s outgoing traffic o NetFlow records of ~250 PlanetLab nodes o Online global iceberg detection service • Global Iceberg detection for o Flow identifier: Destination IP, Source Port, Destination Port o Flow size: Packet count 18 Pegasus PlanetLab Service • 15 hour deployment - Pegasus: 403 MB, Naïve: 2.26 GB • Most outbound traffic to other PlanetLab hosts • 1- Day outgoing traffic: Source Port Icebergs Destination Port Icebergs 3 (CompressNET) 3 (CompressNET) 8 (unassigned) 0 (Reserved) 22 (SSH) 53 (DNS) 80 (HTTP) 80 (HTTP), 443 (HTTPS) • CoDNS and CoDeeN don’t produce many icebergs 19 Conclusions • Pegasus: A distributed measurement system o Commercial co-located compute and storage devices o Low network overhead o High accuracy • Adaptive aggregation for the global iceberg detection o Iterative feedback solution • Experiments from real traces and PlanetLab deployment o low overhead without false +ves and -ves 20 Thank you Questions? 21 Anomaly Examples • Based on traffic features [Kind09] [Kind09] Histogram-Based Traffic Anomaly Detection, In IEEE Trans. on Netwk. Service Management 22 Related Work • Threshold Algorithm (TA) [Fagin03] o Large number of iterations • Three phase uniform threshold (TPUT) [Cao04] o Accounting data distributions [Yu05] • Filtering based continuous monitoring algorithms [Babcock03] [Keralapura06] [Olston03] o Send update to aggregator when local arithmetic constraints fail [Fagin03] Optimal aggregation algorithms for middleware. Jour. of Comp. and Sys. Sciences, 2003 [Cao04] Efficient Top-K Query Calculation in Distributed Networks. In proc. of PODC, 2004 [Yu05] Efficient processing of distributed top-k queries. In Proc. of DEXA, 2005 [Babcock03] Distributed top-k monitoring. In Proc. SIGMOD, 2003 [Keralapura06] Communication-efficient distributed monitoring of thresholded counts. In Proc. of SIGMOD, 2006 [Olston03] Adaptive filters for continuous queries over distributed data streams. In Proc. SIGMOD, 2003 23 Sketch-set Granularity - G • High granularity ⇒ More precise, more expensive representation • Granularity definition: maxSize – minSize • Used to determine if more flows should be combined in a sketch-set • Used to send finer granularity during monitor response (for convergence) 24 Iterative Feedback Algorithm 25 Abilene Trace β little influence on the communication cost 26 Larger is better Enterprise Network sFlow Trace All except one parameter pair (green) has false positives and negatives 27 Scalability with Number of Monitors 28 Larger is better Scalability with Number of Monitors sFlow trace Abilene trace 29