Data Center Traffic Engineering Jennifer Rexford Fall 2010 (TTh 1:30-2:50 in COS 302) COS 561: Advanced Computer Networks http://www.cs.princeton.edu/courses/archive/fall10/cos561/ Cloud Computing 2 Cloud Computing • Elastic resources – Expand and contract resources – Pay-per-use – Infrastructure on demand • Multi-tenancy – Multiple independent users – Security and resource isolation – Amortize the cost of the (shared) infrastructure • Flexibility service management – Resiliency: isolate failure of servers and storage – Workload movement: move work to other locations 3 Cloud Service Models • Software as a Service – Provider licenses applications to users as a service – E.g., customer relationship management, e-mail, .. – Avoid costs of installation, maintenance, patches, … • Platform as a Service – Provider offers software platform for building applications – E.g., Google’s App-Engine – Avoid worrying about scalability of platform • Infrastructure as a Service – Provider offers raw computing, storage, and network – E.g., Amazon’s Elastic Computing Cloud (EC2) – Avoid buying servers and estimating resource needs 4 Multi-Tier Applications • Applications consist of tasks – Many separate components – Running on different machines • Commodity computers Front end Server – Many general-purpose computers – Not one big mainframe Aggregator – Easier scaling Aggregator Aggregator …… Aggregator … Worker Worker … Worker Worker Worker 5 Enabling Technology: Virtualization • Multiple virtual machines on one physical machine • Applications run unmodified as on real machine • VM can migrate from one computer to another 6 Data Center Network 7 Status Quo: Virtual Switch in Server 8 Top-of-Rack Architecture • Rack of servers – Commodity servers – And top-of-rack switch • Modular design – Preconfigured racks – Power, network, and storage cabling • Aggregate to the next level 9 Modularity, Modularity, Modularity • Containers • Many containers 10 Data Center Network Topology Internet CR AR AR S S S S S … CR ... S … ~ 1,000 servers/pod AR AR ... • • • • Key CR = Core Router AR = Access Router S = Ethernet Switch A = Rack of app. servers 11 Capacity Mismatch CR CR ~ 200:1 AR AR AR AR S S S S S S ~ 40:1 S ~ S5:1 … S S … ... S … S … 12 Data-Center Routing Internet CR DC-Layer 3 AR AR S S S S S SS CR ... AR AR DC-Layer 2 S … S S … ~ 1,000 servers/pod == IP subnet ... • • • • Key CR = Core Router (L3) AR = Access Router (L3) S = Ethernet Switch (L2) A = Rack of app. servers 13 Reminder: Layer 2 vs. Layer 3 • Ethernet switching (layer 2) – Cheaper switch equipment – Fixed addresses and auto-configuration – Seamless mobility, migration, and failover • IP routing (layer 3) – Scalability through hierarchical addressing – Efficiency through shortest-path routing – Multipath routing through equal-cost multipath • So, like in enterprises… – Data centers often connect layer-2 islands by IP routers 14 Load Balancers • Spread load over server replicas – Present a single public address (VIP) for a service – Direct each request to a server replica 10.10.10.1 Virtual IP (VIP) 192.121.10.1 10.10.10.2 10.10.10.3 15 Data Center Costs (Monthly Costs) • Servers: 45% – CPU, memory, disk • Infrastructure: 25% – UPS, cooling, power distribution • Power draw: 15% – Electrical utility costs • Network: 15% – Switches, links, transit http://perspectives.mvdirona.com/2008/11/28/CostOfPowerInLargeScaleDataCenters.aspx 16 Wide-Area Network ... Servers Router DNS Server DNS-based site selection Data Centers ... Servers Router Internet Clients 17 Wide-Area Network: Ingress Proxies ... ... Data Centers Servers Router Proxy Servers Router Proxy Clients 18 Data Center Traffic Engineering Challenges and Opportunities 19 Traffic Engineering Challenges • Scale – Many switches, hosts, and virtual machines • Churn – Large number of component failures – Virtual Machine (VM) migration • Traffic characteristics – High traffic volume and dense traffic matrix – Volatile, unpredictable traffic patterns • Performance requirements – Delay-sensitive applications – Resource isolation between tenants 20 Traffic Engineering Opportunities • Efficient network – Low propagation delay and high capacity • Specialized topology – Fat tree, Clos network, etc. – Opportunities for hierarchical addressing • Control over both network and hosts – Joint optimization of routing and server placement – Can move network functionality into the end host • Flexible movement of workload – Services replicated at multiple servers and data centers – Virtual Machine (VM) migration 21 VL2 Paper Slides from Changhoon Kim (now at Microsoft) 22 Virtual Layer 2 Switch CR AR 1. L2 semantics ... AR 2. Uniform high S S capacity S S … S S … CR AR AR 3. Performance S S isolation ... S S … S S … 23 VL2 Goals and Solutions Approach Solution 1. Layer-2 semantics Employ flat addressing Name-location separation & resolution service 2. Uniform high capacity between servers Guarantee bandwidth for hose-model traffic Flow-based random traffic indirection (Valiant LB) Enforce hose model using existing mechanisms only TCP Objective 3. Performance Isolation “Hose”: each node has ingress/egress bandwidth constraints 24 Name/Location Separation Cope with host churns with very little overhead Switches run link-state routing and maintain only switch-level topology Directory Service • Allows to use low-cost switches … ToR2 • Protects network and hosts from host-statexy churn ToR3 • Obviates host and switch reconfiguration z ToR34 ToR1 . . . ToR2 … . . . ToR3 . . . ToR4 ToR3 y payload ToR34 z payload x y,yz Servers use flat names z Lookup & Response 25 Clos Network Topology Offer huge aggr capacity & multi paths at modest cost Int D (# of 10G ports) 48 Aggr 96 144 . . . TOR ... 20 Servers ... Max DC size (# of Servers) 11,520 ... 46,080 K aggr switches with D ports 103,680 ...... ........ 20*(DK/4) Servers 26 Valiant Load Balancing: Indirection Cope with arbitrary TMs with very little overhead IANY IANY IANY Links used for up paths [ ECMP + IP Anycast ] Links used for down paths • Harness huge bisection bandwidth • Obviate esoteric traffic engineering or optimization • Ensure robustness to failures • Work with switch mechanisms available today T1 IANY T35 T2 T3 x Must spread Equal 1. Cost Multi Pathtraffic Forwarding y Must ensure dst z independence 2. yz payload T4 T5 T6 27 VL2 vs. Seattle • Similar “virtual layer 2” abstraction – Flat end-point addresses – Indirection through intermediate node • Enterprise networks (Seattle) – Hard to change hosts directory on the switches – Sparse traffic patterns effectiveness of caching – Predictable traffic patterns no emphasis on TE • Data center networks (VL2) – Easy to change hosts move functionality to hosts – Dense traffic matrix reduce dependency on caching – Unpredictable traffic patterns ECMP and VLB for TE 28 Ongoing Research 29 Research Questions • What topology to use in data centers? – Reducing wiring complexity – Achieving high bisection bandwidth – Exploiting capabilities of optics and wireless • Routing architecture? – Flat layer-2 network vs. hybrid switch/router – Flat vs. hierarchical addressing • How to perform traffic engineering? – Over-engineering vs. adapting to load – Server selection, VM placement, or optimizing routing • Virtualization of NICs, servers, switches, … 30 Research Questions • Rethinking TCP congestion control? – Low propagation delay and high bandwidth – “Incast” problem leading to bursty packet loss • Division of labor for TE, access control, … – VM, hypervisor, ToR, and core switches/routers • Reducing energy consumption – Better load balancing vs. selective shutting down • Wide-area traffic engineering – Selecting the least-loaded or closest data center • Security – Preventing information leakage and attacks 31 Discuss 32