SYNERGISTIC NETWORK OPERATIONS Saqib Raza University of California, Davis A SNAPSHOT OF NETWORK OPERATIONS Forwarding Inter-domain TE Scheduling Intra-domain TE Firewalls Maintenance Traffic Policing 2 Power Accounting Diagnostics Management Forensics Overlay Routing EXAMPLE: INTER-OPERATION DYNAMICS B A Initially, traffic between overlay nodes A and D does not traverse ISP-A D ISP A ISP-A alters link weights to direct away from link (x,y). C Sensing reduced delay through ISP-A the routing overlay starts sending traffic from A to D through ISP-A 3 Overlay Routing Intra-domain TE THE HIPPOCRATIC OATH FOR NETWORK OPERATIONS Do No Harm Operations should be cognizant of any disruptive effects to other operations. Strive to do Good Operations should seek to enhance the efficacy of other operations. 4 SUMMARY/OUTLINE Interface-Split Forwarding for Finer-Grained Traffic Engineering [Performance `07, Eval `07] Cooperative Peer-to-Peer Repair of 3G Broadcast Losses [Broadnets `08, ICC `08, ICME `07] Network-level footprints of Online Social Network Applications [IMC `09, IMC `08] Graceful Network State Migration [Infocom `09] MeasuRouting: A Framework for Routing Assisted Traffic Monitoring [Infocom `10] Future Directions 5 Do No Harm Maintenance Intra-domain TE GRACEFUL NETWORK MIGRATION 6 minimizing performance disruption during planned network maintenance … Joint work with: Yuanbo Zhu & Chen-Nee Chuah (UC Davis) MOTIVATION Network Events Performance Disruption Inadvertent Premeditated e.g. fiber-cuts, router crashes e.g. firmware upgrades Premeditated network tasks can be judiciously scheduled to minimize performance disruption 7 GRACEFUL STATE MIGRATION (GSM) GSM represent a class of problems characterized by two essential characteristics: Network needs to transition from an initial state to a final state Sequence of atomic network operations (e.g. deactivating/activating a router or link) 8 SAMPLE APPLICATION Link Maintenance Scheduling (LMS) Maintenance activities account for more than 20% of failures in backbone ISPs [Markopoulou ‘04]. Weekly maintenance windows: multiple links need to be maintained in each window. Each link needs to be deactivated and then reactivated . Link failures can disrupt intra-domain TE. 9 LMS: ILLUSTRATIVE EXAMPLE b Link Weights 1 2 1 a c e 1 3 1 Link Capacity = C 1 1 f Flow Size = ½ C g Max Link Util = 50% d I need to repair links (a,c) and (c,f) Careful! Watch out for the Maximum Link Utilization (MLU) 10 b 1 2 c a e 1 3 1 b 2 1 1 1 g f a a 1 1 3 e 1 1 g f 1 2 1 1 3 b 100% c 1 e d 2 1 c 1 d b 1 1 g f a d c 1 3 1 e 1 1 g f d 11 (a,c) ↓ (a,c) ↑ (c,f) ↓ (c,f) ↑ MLU = 100% b 1 2 c a e 1 3 1 b 2 1 1 g f a a 1 b e 1 3 1 1 3 1 1 g f d 2 c e c d b 1 2 1 1 1 1 g f a d c 1 3 1 e 1 1 g f d 12 (a,c) ↓ (c,f) ↓ (c,f) ↑ (a,c) ↑ MLU = 50% LMS: ILLUSTRATIVE EXAMPLE Schedule 1 (a,c) ↓ (a,c) ↑ (c,f) ↓ (c,f) ↑ MLU = 100% (c,f) ↓ (c,f) ↑ (a,c) ↑ MLU = 50% Schedule 2 (a,c) ↓ The schedule with multiple links simultaneously deactivated causes less disruption 13 s0 s1 sn s3 THE GENERAL GSM PROBLEM (s0,sn) = (sinitial,sfinal) (si,si+1) ∈ A n≤B min C(s0,s1, …sn-1,sn) Specify (sinitial,sfinal), A, B, & C to define a concrete GSM problem, e.g., LMS n repaired r deactivated d not repaired n , nr r dr n d A 14 A GENERAL GSM SOLUTION FRAMEWORK c2k(sx,sz)=miny(ck(sx,sy) + ck(sy,sz)) • The minimum cost of going from sx to sz in 2k steps is equal to the minimum cost of going from sx to sy in k steps plus the cost of going from sy to sz in k steps. 15 COMPUTATIONAL COMPLEXITY 002 GSM is a combinatorial optimization problem 011 001 122 101 000 010 212 020 100 220 110 200 222 Solution space of LMS has 2n!/2n solutions 16 ANTS COLONY OPTIMIZATION f n f n f n Swarm intelligence metaheuristic Near optimal solutions for the Traveling Salesman Problem 17 PERFORMANCE EVALUATION Single-Failure Heuristic works well generally What about the worst case? > 20 node/80 link topology > 100 experiments per data point > Report Cost Reduction (MLU) over Single-Failure Heuristic 18 GST: APPLICATIONS • Link Weight Assignment Scheduling • Network Evolution & Upgrade • MPLS Reroute Sequencing Link Weight Reassignment Scheduling 19 OUTLINE Graceful Network State Migration [Infocom `09] MeasuRouting: A Framework for Routing Assisted Traffic Monitoring [Infocom `10] Future Directions 20 Strive to do Good Measurements Intra-domain TE MEASUROUTING 21 a framework for routing assisted network measurements… Joint work with: Guanyao Huang & Chen-Nee Chuah (UC Davis) Srini Seetharaman & Jatinder Singh (DT Labs) THE MONITOR PLACEMENT PROBLEM Oops! important very important ? ? An evolving universe 1. Measurement objectives change 2. New Traffic gets introduced 3. Traffic placement changes 22 Measurements Intra-domain TE PROBLEM STATEMENT • Configure intra-domain routing to route important traffic subpopulations across paths where they could best be monitored, while avoiding disruption to default traffic engineering. 23 TE POLICY VIOLATION Congestion 24 COMPLIANT REROUTING Monitor TE policy is defined for aggregated flows Sub-populations of aggregated flows, indistinguishable from a TE perspective, can be distinguishable from a measurement perspective 25 OTHER ENABLING FACTORS Aggregate TE Objectives • Aggregate traffic placement may be altered without violating TE 0bjectives: e.g., links with utilization below maximum utilization have free capacity TE-Measurement Tradeoff • TE objectives may be violated to maximize global network utility. 26 1. Aggregated TE Flows e.g. OD pair traffic 2. Traffic placement given: Γ(i,j)E Measurement Flowsets (micro-flowsets) TE Flowset (macro-flowset) 1. TE flowset de-composes into k measurement flowsets 2. A measurement flowset has: a) Size b) Importance 3. Decision variable: (i,j)E 27 27 MEASUROUTING OBJECTIVE 1 2 b yConstraints ij Network Flow Conservation Flowset Size Flowset Routing i p Ensureijthat TE performance remains y pijijof bythe iy default TE within some value Link Sampling Flowset performance Rate Points gained for sampling flowset y on link (i,j) Importance Maximize score across all measurement flowsets across all links 28 THE LOOPING PROBLEM Measurement-flowset can only traverse links in a Directed Acyclic Graph (DAG) RSR: use DAG for the associated OD pair NRL: add additional links to the original DAG 29 SYNTHETIC EXPERIMENTS Select the number of Measurement Flowsets per OD pair (K) Divide all flows between an OD pair into the K measurement flowsets Assign size and importance of the measurement flowsets Choose the permissible TE violation parameter Report improvement in Measurement Score over default routing 30 NETWORK SIZE AS1221 44 nodes AS1239 52 nodes K : 10 Importance : Pareto (=2) Performance sensitive to number of multiple paths 31 DEGREES OF FREEDOM AS1221 44 nodes : 0.1 Importance : Pareto (=2) Diminishing marginal returns of increasing k 32 A REAL APPLICATION Trace capture infrastructure selectively deployed Increase representation of interesting traffic in traces Trace Capture for Deep Packet Inspection (DPI) Abilene 9 nodes Q(i) P(i) ln(1-|P(i)-Q(i)|) 33 REAL WORLD MEASUROUTING Underlying Routing Substrates • Configurable Routing: MPLS, OpenFlow • IP Routing: Equal Cost Multipath Applications • Heterogeneous Sampling Algorithms • Distributed Firewalls 34 OUTLINE Graceful Network State Migration [Infocom `09] MeasuRouting: A Framework for Routing Assisted Traffic Monitoring [Infocom `10] Future Directions 35 OPTIMAL STATES OF BEING Graceful Network State Migration • Data Center Job Scheduling • Data Center Load Distribution 36 DATA CENTER JOB SCHEDULING Power Management Scheduling Power conserved by switching off data center components, dynamic voltage scaling etc. Jobs scheduled on different servers to optimize performance (MapReduce, Dyrad). Jointly optimize job scheduling and power management decisions. 37 DATA CENTER LOAD DISTRIBUTION Power Management Inter-domain TE Data center operation costs vary geographically due to energy market price fluctuations [Qureshi `09] Makes sense to operate data centers in diverse energy markets. Data center load can not be instantaneously shifted from one location to another. 38 Chalk out optimal state trajectory of BGP route advertisements. A CALCULUS FOR SYNERGISTIC OPERATIONS Revenue Contribution Network-wide Security Global Utility Each marginal unit of a resource ought to be allocated to the operation that derives the highest marginal utility from consuming it. CPU Cycles Bandwidth Common Resource Pool Power 39 Questions wwwcsif.cs.ucdavis.edu/~raza www.ece.ucdavis.edu/rubinet 40 MEASUREMENT UTILITY DIVERSITY AS1221 44 nodes k=10; M=3000 Importance: Pareto (=2) Performance improves with variance in importance 41 LMS IN A SMALL NETWORK (ABILENE) 42 MEASUROUTING PATH INFLATION 43