Data Center Traffic Engineering

Data Center Traffic Engineering Jennifer Rexford Fall 2010 (TTh 1:30-2:50 in COS 302) COS 561: Advanced Computer Networks http://www.cs.princeton.edu/courses/archive/fall10/cos561/ Cloud Computing 2 Cloud Computing • Elastic resources – Expand and contract resources – Pay-per-use – Infrastructure on demand • Multi-tenancy – Multiple independent users – Security and resource isolation – Amortize the cost of the (shared) infrastructure • Flexibility service management – Resiliency: isolate failure of servers and storage – Workload movement: move work to other locations 3 Cloud Service Models • Software as a Service – Provider licenses applications to users as a service – E.g., customer relationship management, e-mail, .. – Avoid costs of installation, maintenance, patches, … • Platform as a Service – Provider offers software platform for building applications – E.g., Google’s App-Engine – Avoid worrying about scalability of platform • Infrastructure as a Service – Provider offers raw computing, storage, and network – E.g., Amazon’s Elastic Computing Cloud (EC2) – Avoid buying servers and estimating resource needs 4 Multi-Tier Applications • Applications consist of tasks – Many separate components – Running on different machines • Commodity computers Front end Server – Many general-purpose computers – Not one big mainframe Aggregator – Easier scaling Aggregator Aggregator …… Aggregator … Worker Worker … Worker Worker Worker 5 Enabling Technology: Virtualization • Multiple virtual machines on one physical machine • Applications run unmodified as on real machine • VM can migrate from one computer to another 6 Data Center Network 7 Status Quo: Virtual Switch in Server 8 Top-of-Rack Architecture • Rack of servers – Commodity servers – And top-of-rack switch • Modular design – Preconfigured racks – Power, network, and storage cabling • Aggregate to the next level 9 Modularity, Modularity, Modularity • Containers • Many containers 10 Data Center Network Topology Internet CR AR AR S S S S S … CR ... S … ~ 1,000 servers/pod AR AR ... • • • • Key CR = Core Router AR = Access Router S = Ethernet Switch A = Rack of app. servers 11 Capacity Mismatch CR CR ~ 200:1 AR AR AR AR S S S S S S ~ 40:1 S ~ S5:1 … S S … ... S … S … 12 Data-Center Routing Internet CR DC-Layer 3 AR AR S S S S S SS CR ... AR AR DC-Layer 2 S … S S … ~ 1,000 servers/pod == IP subnet ... • • • • Key CR = Core Router (L3) AR = Access Router (L3) S = Ethernet Switch (L2) A = Rack of app. servers 13 Reminder: Layer 2 vs. Layer 3 • Ethernet switching (layer 2) – Cheaper switch equipment – Fixed addresses and auto-configuration – Seamless mobility, migration, and failover • IP routing (layer 3) – Scalability through hierarchical addressing – Efficiency through shortest-path routing – Multipath routing through equal-cost multipath • So, like in enterprises… – Data centers often connect layer-2 islands by IP routers 14 Load Balancers • Spread load over server replicas – Present a single public address (VIP) for a service – Direct each request to a server replica 10.10.10.1 Virtual IP (VIP) 192.121.10.1 10.10.10.2 10.10.10.3 15 Data Center Costs (Monthly Costs) • Servers: 45% – CPU, memory, disk • Infrastructure: 25% – UPS, cooling, power distribution • Power draw: 15% – Electrical utility costs • Network: 15% – Switches, links, transit http://perspectives.mvdirona.com/2008/11/28/CostOfPowerInLargeScaleDataCenters.aspx 16 Wide-Area Network ... Servers Router DNS Server DNS-based site selection Data Centers ... Servers Router Internet Clients 17 Wide-Area Network: Ingress Proxies ... ... Data Centers Servers Router Proxy Servers Router Proxy Clients 18 Data Center Traffic Engineering Challenges and Opportunities 19 Traffic Engineering Challenges • Scale – Many switches, hosts, and virtual machines • Churn – Large number of component failures – Virtual Machine (VM) migration • Traffic characteristics – High traffic volume and dense traffic matrix – Volatile, unpredictable traffic patterns • Performance requirements – Delay-sensitive applications – Resource isolation between tenants 20 Traffic Engineering Opportunities • Efficient network – Low propagation delay and high capacity • Specialized topology – Fat tree, Clos network, etc. – Opportunities for hierarchical addressing • Control over both network and hosts – Joint optimization of routing and server placement – Can move network functionality into the end host • Flexible movement of workload – Services replicated at multiple servers and data centers – Virtual Machine (VM) migration 21 VL2 Paper Slides from Changhoon Kim (now at Microsoft) 22 Virtual Layer 2 Switch CR AR 1. L2 semantics ... AR 2. Uniform high S S capacity S S … S S … CR AR AR 3. Performance S S isolation ... S S … S S … 23 VL2 Goals and Solutions Approach Solution 1. Layer-2 semantics Employ flat addressing Name-location separation & resolution service 2. Uniform high capacity between servers Guarantee bandwidth for hose-model traffic Flow-based random traffic indirection (Valiant LB) Enforce hose model using existing mechanisms only TCP Objective 3. Performance Isolation “Hose”: each node has ingress/egress bandwidth constraints 24 Name/Location Separation Cope with host churns with very little overhead Switches run link-state routing and maintain only switch-level topology Directory Service • Allows to use low-cost switches …  ToR2 • Protects network and hosts from host-statexy churn  ToR3 • Obviates host and switch reconfiguration z  ToR34 ToR1 . . . ToR2 … . . . ToR3 . . . ToR4 ToR3 y payload ToR34 z payload x y,yz Servers use flat names z Lookup & Response 25 Clos Network Topology Offer huge aggr capacity & multi paths at modest cost Int D (# of 10G ports) 48 Aggr 96 144 . . . TOR ... 20 Servers ... Max DC size (# of Servers) 11,520 ... 46,080 K aggr switches with D ports 103,680 ...... ........ 20*(DK/4) Servers 26 Valiant Load Balancing: Indirection Cope with arbitrary TMs with very little overhead IANY IANY IANY Links used for up paths [ ECMP + IP Anycast ] Links used for down paths • Harness huge bisection bandwidth • Obviate esoteric traffic engineering or optimization • Ensure robustness to failures • Work with switch mechanisms available today T1 IANY T35 T2 T3 x Must spread Equal 1. Cost Multi Pathtraffic Forwarding y Must ensure dst z independence 2. yz payload T4 T5 T6 27 VL2 vs. Seattle • Similar “virtual layer 2” abstraction – Flat end-point addresses – Indirection through intermediate node • Enterprise networks (Seattle) – Hard to change hosts  directory on the switches – Sparse traffic patterns  effectiveness of caching – Predictable traffic patterns  no emphasis on TE • Data center networks (VL2) – Easy to change hosts  move functionality to hosts – Dense traffic matrix  reduce dependency on caching – Unpredictable traffic patterns  ECMP and VLB for TE 28 Ongoing Research 29 Research Questions • What topology to use in data centers? – Reducing wiring complexity – Achieving high bisection bandwidth – Exploiting capabilities of optics and wireless • Routing architecture? – Flat layer-2 network vs. hybrid switch/router – Flat vs. hierarchical addressing • How to perform traffic engineering? – Over-engineering vs. adapting to load – Server selection, VM placement, or optimizing routing • Virtualization of NICs, servers, switches, … 30 Research Questions • Rethinking TCP congestion control? – Low propagation delay and high bandwidth – “Incast” problem leading to bursty packet loss • Division of labor for TE, access control, … – VM, hypervisor, ToR, and core switches/routers • Reducing energy consumption – Better load balancing vs. selective shutting down • Wide-area traffic engineering – Selecting the least-loaded or closest data center • Security – Preventing information leakage and attacks 31 Discuss 32

Data Center Traffic Engineering

Related documents

Products

Support

Data Center Traffic Engineering

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib