Data Streaming in Computer Networking Cristian Estan, George Varghese University of California, San Diego Talk structure Traditional streaming in networking Rules of the game Iteration paradigm: packet scheduling example New streaming problems Detecting malicious traffic Understanding network workloads June 8, 2003 Data streaming in computer networking - MPDS 2003 2 Internet service model Source port Destination port Source IP address Destination IP address Data Header Flow Internet June 8, 2003 Data streaming in computer networking - MPDS 2003 3 Traditional router functions ? IP Lookup Incoming 1 Outgoing 1 Incoming 2 Outgoing 2 Incoming 3 Outgoing 3 June 8, 2003 Data streaming in computer networking - MPDS 2003 4 Traditional router functions Out2 IP Lookup Incoming 1 Outgoing 1 Incoming 2 Outgoing 2 Incoming 3 Outgoing 3 June 8, 2003 Data streaming in computer networking - MPDS 2003 5 Traditional router functions Out2 Incoming 1 Incoming 2 Incoming 3 June 8, 2003 Switching Out3 Out3 Out1 Out2 Data streaming in computer networking - MPDS 2003 Outgoing 1 Outgoing 2 Outgoing 3 6 Traditional router functions Scheduling Incoming 1 Outgoing 1 Flow 1 Flow 2 Incoming 2 Flow 3 Incoming 3 June 8, 2003 Outgoing 2 Outgoing 3 Data streaming in computer networking - MPDS 2003 7 Traditional router functions Scheduling Incoming 1 Outgoing 1 Flow 1 Flow 2 Flow 3 Incoming 2 Outgoing 2 Incoming 3 Outgoing 3 June 8, 2003 Data streaming in computer networking - MPDS 2003 8 Rules of the game Wire speed processing What does this mean for algorithms? At 40 gigabits/s 8 nanoseconds per packet - need fast SRAM Limited SRAM (say 32 megabits) but millions of flows Low worst case complexity bounds Low bounds on the amount of memory used Differences from databases June 8, 2003 One pass vs. multiple passes Worst case vs. average case Small constants vs. asymptotic complexity Data streaming in computer networking - MPDS 2003 9 Talk structure Traditional streaming in networking Rules of the game Iteration paradigm: packet scheduling example New streaming problems Detecting malicious traffic Understanding network workloads June 8, 2003 Data streaming in computer networking - MPDS 2003 10 Iteration paradigm Many networking algorithms use iteration in time Way to allow multi-pass algorithms without storing input by assuming inputs do not change quickly Many examples (MULTOPS for DoS detection [Gil01], CSFQ for scheduling [Stoica98]) Would be nice to formalize tradeoff between quality of results and drift rate of input June 8, 2003 Data streaming in computer networking - MPDS 2003 11 Example: Core Stateless FQ R R Mark rate R June 8, 2003 R If R>F drop with probability 1-F/R Iteratively compute fair share F Data streaming in computer networking - MPDS 2003 12 Talk structure Traditional streaming in networking Rules of the game Iteration paradigm: packet scheduling example New streaming problems Detecting malicious traffic Understanding network workloads June 8, 2003 Data streaming in computer networking - MPDS 2003 13 New streaming problems Detecting malicious activity Flooding (denial of service attacks) Worms Scans looking for vulnerable servers Understanding workloads June 8, 2003 Billing Planning network growth Application mix Data streaming in computer networking - MPDS 2003 14 Detecting malicious traffic Well defined building blocks Detecting large aggregates » Counting active flows in an aggregate » Similar to iceberg queries Similar to counting distinct values Many open problems: e.g. detect worms and DoS attacks (not clear what is right formal problem statement) June 8, 2003 Data streaming in computer networking - MPDS 2003 15 Talk structure Traditional streaming in networking Rules of the game Iteration paradigm: packet scheduling example New streaming problems Detecting malicious traffic Understanding network workloads June 8, 2003 Data streaming in computer networking - MPDS 2003 16 Informal problem definition Analysis Traffic reports Applications: 50% of traffic is Kazaa Terabytes of measurement data Sources: 20% of traffic comes from Steve’s PC June 8, 2003 Data streaming in computer networking - MPDS 2003 17 Informal problem definition Analysis Traffic reports 20% is Kazaa from Steve’s PC Terabytes of measurement data June 8, 2003 50% is Kazaa from the dorms Data streaming in computer networking - MPDS 2003 18 Formal problem definition Define clusters: Threshold clusters: Atoms: fields 1 to n with hierarchies in each field including * Cluster: intersection of one set from each field hierarchy Example: Source=*, Destination=CS Net, App= Email Report traffic clusters above threshold T (e.g. 1% of traffic) Omit redundant clusters: June 8, 2003 Compression rule: remove general clusters from report when its traffic can be inferred (up to error T) from on nonoverlapping more specific clusters Data streaming in computer networking - MPDS 2003 19 Solution status The good: The bad: Offline tool AutoFocus; SIGCOMM 2003 paper Detected worm, busy servers, squid cache, etc. Network managers like it Takes long: 3 hours at T=0.5% for one day trace Needs much memory 300 Mbytes The wanted: June 8, 2003 Streaming algorithm - we invite improvements Data streaming in computer networking - MPDS 2003 20 Conclusions New rules: strict constraints on algorithms running in routers Iteration in time: can give simple algorithms, but needs more formalization as to quality of results General open problems: many challenges in detecting malicious traffic such as worms and DoS attacks Specific open problem: computing traffic cluster reports in streaming fashion June 8, 2003 Data streaming in computer networking - MPDS 2003 21 Thank you! Databases June 8, 2003 ? Data streaming in computer networking - MPDS 2003 22 Unidimensional clusters 15 10.8.0.2 35 30 40 10.8.0.3 10.8.0.4 10.8.0.5 June 8, 2003 160 35 110 10.8.0.8 10.8.0.9 75 10.8.0.10 10.8.0.14 Data streaming in computer networking - MPDS 2003 23 Unidimensional clusters 10.8.0.0/28 500 10.8.0.0/29 120 10.8.0.8/29 380 10.8.0.0/30 50 10.8.0.4/30 70 10.8.0.2/31 50 10.8.0.4/31 70 15 10.8.0.2 35 30 10.8.0.8/30 305 10.8.0.8/31 270 40 10.8.0.3 10.8.0.4 10.8.0.5 June 8, 2003 160 75 10.8.0.12/30 10.8.0. 10/31 110 10.8.0.8 10.8.0.9 35 75 10.8.0.14/31 35 75 10.8.0.10 10.8.0.14 Data streaming in computer networking - MPDS 2003 24 Unidimensional clusters 10.8.0.0/28 500 10.8.0.0/29 120 10.8.0.8/29 380 10.8.0.0/30 50 10.8.0.4/30 70 10.8.0.2/31 50 10.8.0.4/31 70 15 10.8.0.2 35 30 10.8.0.8/30 305 10.8.0.8/31 270 40 10.8.0.3 10.8.0.4 10.8.0.5 June 8, 2003 160 75 10.8.0.12/30 10.8.0. 10/31 110 10.8.0.8 10.8.0.9 35 75 10.8.0.14/31 35 75 10.8.0.10 10.8.0.14 Data streaming in computer networking - MPDS 2003 25 Unidimensional clusters 10.8.0.0/28 500 10.8.0.0/29 120 10.8.0.8/29 380 10.8.0.8/30 305 10.8.0.8/31 270 160 110 10.8.0.8 10.8.0.9 June 8, 2003 Data streaming in computer networking - MPDS 2003 26 Unidimensional clusters 10.8.0.0/28 500 10.8.0.0/29 120 10.8.0.8/29 380 10.8.0.8/30 305 10.8.0.8/31 270 160 110 10.8.0.8 10.8.0.9 June 8, 2003 Data streaming in computer networking - MPDS 2003 27 Multidimensional clusters Two dimensions Source network Protocol (traffic type) Trees turn into lattice Multiple parents Nodes overlap June 8, 2003 Data streaming in computer networking - MPDS 2003 28 Offline solution June 8, 2003 Data streaming in computer networking - MPDS 2003 29 Sample report June 8, 2003 Data streaming in computer networking - MPDS 2003 30