Networking the Cloud Presenter: b97901184 電機三 姜慧如 Data Centers holding tens to hundreds of thousands of servers. Concurrently supporting a large number of distinct services. Economies of scale. Dynamically reallocate servers among services as workload pattern changes. High utilization is needed. Agility! Agility: Any machine to be able to play any role. “Any Service, any Server.” Ugly Secret: 30% utilization is considered good. Uneven application fit: -- Each server has CPU, memory, disk: most applications exhaust one resource, stranding the others. Long provisioning timescales: -- New servers purchased quarterly at best. Uncertainty in demand: -- Demand for a new service can spike quickly. Risk management: -- Not having spare servers to meet demand brings failure just when success is at hand. Session state and storage constraints: -- If the world were stateless servers…. Workload management -- Means for rapidly installing a service’s code on a server. Virtual Machines, disk images. Storage management -- Means for a server to access persistent data. Distributed filesystems. Network -- Means for communicating with other servers, regardless of where they are in the data center. But: Today’s data center network prevent agility in several ways. CR S CR AR AR AR AR S S S S 1:240 I want S more 1:80 S S 1:5 … S ... … … Static network assignment Fragmentation of resource I have spare ones, S S but… S … Poor server to server connectivity Traffics affects each other Poor reliability and utilization Achieve scale by assigning servers topologically related IP addresses and dividing servers among VLANs. Limited utility of VMs, cannot migrate out the original VLAN while keeping the same IP address. Fragmentation of address space. Configuration needed when reassigned to different services. CR AR 1. L2 semantics ... AR 2. Uniform high S S capacity S S … 9 S AR 3. Performance S isolation S … CR ... S S … AR S S S … Developers want a mental model where all their servers, and only their servers, are plugged into an Ethernet switch. 1. Layer-2 semantics -- Flat addressing, so any server can have any IP Address. -- Server configuration is the same as in a LAN. -- VM keeps the same IP address even after migration 2. Uniform high capacity -- Capacity between servers limited only by their NICs. -- No need to consider topology when adding servers. 3. Performance isolation -- Traffic of one service should be unaffected by others. Objective 1. Layer-2 semantics Approach Employ flat addressing Solution Name-location separation & resolution service 2. Uniform high capacity between servers Guarantee bandwidth for hose-model traffic Flow-based random traffic indirection (Valiant LB) 3. Performance Enforce hose model using existing mechanisms only TCP Isolation “Hose”: each node has ingress/egress bandwidth constraints 11 Ethernet switching (layer 2) Cheaper switch equipment Fixed addresses and auto-configuration Seamless mobility, migration, and failover IP routing (layer 3) Scalability through hierarchical addressing Efficiency through shortest-path routing Multipath routing through equal-cost multipath So, like in enterprises… Data centers often connect layer-2 islands by IP routers 12 Data-Center traffic analysis: DC traffic != Internet traffic Traffic volume between servers to entering/leaving data center is 4:1 Demand for bandwidth between servers growing faster Network is the bottleneck of computation Flow distribution analysis: Majority of flows are small, biggest flow size is 100MB The distribution of internal flows is simpler and more uniform 50% times of 10 concurrent flows, 5% greater than 80 concurrent flows Traffic matrix analysis: Poor summarizing of traffic patterns Instability of traffic patterns Failure characteristics: Pattern of networking equipment failures: 95% < 1min, 98% < 1hr, 99.6% < 1 day, 0.09% > 10 days No obvious way to eliminate all failures from the top of the hierarchy Flat Addressing: Allow service instances (ex. virtual machines) to be placed anywhere in the network. Valiant Load Balancing: (Randomly) Spread network traffic uniformly across network paths. End-system based address resolution: To scale to large server pools, without introducing complexity to the network control plane. Design principle: Randomizing to cope with volatility: ▪ Using Valiant Load Balancing (VLB) to do destination independent traffic spreading across multiple intermediate nodes Building on proven networking technology: ▪ Using IP routing and forwarding technologies available in commodity switches Separating names from locators: ▪ Using directory system to maintain the mapping between names and locations Embracing end systems: ▪ A VL2 agent at each server Offer huge aggr capacity & multi paths at modest cost Int ... ... Aggr K aggr switches with D ports ... ... 18 TOR 20 Servers ...... ........ 20*(DK/4) Servers Cope with arbitrary TMs with very little overhead IANY IANY IANY Links used for up paths Links used for down paths [ ECMP + IP Anycast ] • • • • Harness huge bisection bandwidth Obviate esoteric traffic engineering or optimization Ensure robustness to failures Work with switch mechanisms available today T1 IANY T35 zy T2 payload x 20 T3 T4 T5 T6 1. Must spread Equal Cost Multitraffic Path Forwarding y2. Must ensure dst z independence IANY IANY IANY Links used for up paths Links used for down paths T1 IANY T53 T2 T3 x y T4 T5 yz payload z T6 How Smart servers use Dumb switches– Encapsulation. Commodity switches have simple forwarding primitives. Complexity moved to servers -- computing the headers. RSM RSM Servers 3. Replicate RSM RSM 4. Ack (6. Disseminate) 2. Set ... DS DS 2. Reply ... DS 2. Reply 1. Lookup ... 5. Ack 1. Update Agent Agent “Lookup” “Update” Directory Servers Switches run link-state routing and maintain only switch-level topology LAs ToR1 . . . ToR3 y payload ToR34 z payload AAs ToR2 x ... ToR3 y,yz Servers use flat names ... Directory Service ToR4 … x ToR2 y ToR3 z ToR34 … z Lookup & Response Data center Oses already heavily modified for VMs, storage clouds, etc. No change to apps or clients outside DC. Uniform high capacity: All-to-all data shuffle stress test: ▪ 75 servers, deliver 500MB ▪ Maximal achievable goodput is 62.3 ▪ VL2 network efficiency as 58.8/62.3 = 94% Performance isolation: Two types of services: ▪ Service one: 18 servers do single TCP transfer all the time ▪ Service two: 19 servers starts a 8GB transfer over TCP every 2 seconds ▪ Service two: 19 servers burst short TCP connections Convergence after link failures 75 servers All-to-all data shuffle Disconnect links between intermediate and aggregation switches