Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101 http://www.cs.princeton.edu/courses/archive/spr12/cos461/ Networking Case Studies Data Center Enterprise Backbone Cellular Wireless 2 Cloud Computing 3 Cloud Computing • Elastic resources – Expand and contract resources – Pay-per-use – Infrastructure on demand • Multi-tenancy – Multiple independent users – Security and resource isolation – Amortize the cost of the (shared) infrastructure • Flexible service management 4 Cloud Service Models • Software as a Service – Provider licenses applications to users as a service – E.g., customer relationship management, e-mail, .. – Avoid costs of installation, maintenance, patches, … • Platform as a Service – Provider offers platform for building applications – E.g., Google’s App-Engine – Avoid worrying about scalability of platform 5 Cloud Service Models • Infrastructure as a Service – Provider offers raw computing, storage, and network – E.g., Amazon’s Elastic Computing Cloud (EC2) – Avoid buying servers and estimating resource needs 6 Enabling Technology: Virtualization • Multiple virtual machines on one physical machine • Applications run unmodified as on real machine • VM can migrate from one computer to another 7 Multi-Tier Applications • Applications consist of tasks – Many separate components – Running on different machines • Commodity computers – Many general-purpose computers – Not one big mainframe – Easier scaling Multi-Tier Applications Front end Server Aggregator Aggregator Aggregator … … Aggregator … Worker 9 Worker … Worker Worker Worker Data Center Network 10 Virtual Switch in Server 11 Top-of-Rack Architecture • Rack of servers – Commodity servers – And top-of-rack switch • Modular design – Preconfigured racks – Power, network, and storage cabling 12 Aggregate to the Next Level 13 Modularity, Modularity, Modularity • Containers • Many containers 14 Data Center Network Topology Internet CR S AR AR S S S S … ~ 1,000 servers/pod 15 CR ... S … AR AR ... • • • • Key CR = Core Router AR = Access Router S = Ethernet Switch A = Rack of app. servers Capacity Mismatch CR CR ~ 200:1 AR AR AR AR S S S S S S ~ 40:1 S ~ S5:1 … 16 S S … ... S … S … Data-Center Routing Internet CR DC-Layer 3 DC-Layer 2 SS … AR AR SS SS S S SS CR ... SS … ~ 1,000 servers/pod == IP subnet 17 AR AR ... • • • • Key CR = Core Router (L3) AR = Access Router (L3) S = Ethernet Switch (L2) A = Rack of app. servers Reminder: Layer 2 vs. Layer 3 • Ethernet switching (layer 2) – Cheaper switch equipment – Fixed addresses and auto-configuration – Seamless mobility, migration, and failover • IP routing (layer 3) – Scalability through hierarchical addressing – Efficiency through shortest-path routing – Multipath routing through equal-cost multipath • So, like in enterprises… 18 – Connect layer-2 islands by IP routers Case Study: Performance Diagnosis in Data Centers http://www.eecs.berkeley.edu/~minl anyu/writeup/nsdi11.pdf 19 Applications Inside Data Centers …. …. Front end Aggregator Server …. …. Workers 20 Challenges of Datacenter Diagnosis • Multi-tier applications – Hundreds of application components – Tens of thousands of servers • Evolving applications – Add new features, fix bugs – Change components while app is still in operation • Human factors – Developers may not understand network well – Nagle’s algorithm, delayed ACK, etc. 21 Diagnosing in Today’s Data Center App logs: #Reqs/sec Response time 1% req. >200ms delay Host App OS SNAP: Diagnose net-app interactions Packet trace: Filter out trace for long delay req. Packet sniffer Switch logs: #bytes/pkts per minute 22 Problems of Different Logs App logs: Application-specific Packet trace: Too expensive Host App OS SNAP: Generic, fine-grained, and lightweight Runs everywhere, all the time Packet sniffer Switch logs: Too coarse-grained 23 TCP Statistics • Instantaneous snapshots – #Bytes in the send buffer – Congestion window size, receiver window size – Snapshots based on random sampling • Cumulative counters – #FastRetrans, #Timeout – RTT estimation: #SampleRTT, #SumRTT – RwinLimitTime – Calculate difference between two polls 24 Identifying Performance Problems Sender App – Not any other problems Send Buffer – Send buffer is almost full Network Receiver – #Fast retransmission – #Timeout – RwinLimitTime – Delayed ACK Sampling Direct measure Inference diff(SumRTT)/diff(SampleRTT) > MaxDelay 25 SNAP Architecture At each host for every connection Collect data Direct access to OS -Polling per-connection statistics: • Snapshots (#bytes in send buffer) • Cumulative counters (#FastRestrans) -Adaptive tuning of polling rate 26 SNAP Architecture At each host for every connection Collect data Performance Classifier Classifying based on the life of data transfer -Algorithms for detecting performance problems -Based on direct measurement in the OS 27 SNAP Architecture At each host for every connection Collect data Performance Classifier Crossconnection correlation Direct access to data center configurations -Input • Topology, routing information • Mapping from connections to processes/apps -Correlate problems across connections • Sharing the same switch/link, app code 28 SNAP Deployment • Production data center – 8K machines, 700 applications – Ran SNAP for a week, collected petabytes of data • Identified 15 major performance problems – Operators: Characterize key problems in data center – Developers: Quickly pinpoint problems in app software, network stack, and their interactions 29 Characterizing Perf. Limitations #Apps that are limited for > 50% of the time Sender App 551 Apps – Bottlenecked by CPU, disk, etc. – Slow due to app design (small writes) Send Buffer 1 App – Send buffer not large enough Networ k 6 Apps – Fast retransmission – Timeout Receiver 8 Apps – Not reading fast enough (CPU, disk, etc.) 144 30 Apps – Not ACKing fast enough (Delayed ACK) Delayed ACK • Delayed ACK caused significant problems – Delayed ACK was used to reduce bandwidth usage and server interruption B A B has data to send …. Delayed ACK should be disabled in data centers B doesn’t have data to send 200 ms 31 Diagnosing Delayed ACK with SNAP • Monitor at the right place – Scalable, low overhead data collection at all hosts • Algorithms to identify performance problems – Identify delayed ACK with OS information • Correlate problems across connections – Identify the apps with significant delayed ACK issues • Fix the problem with operators and developers – Disable delayed ACK in data centers 32 Conclusion • Cloud computing – Major trend in IT industry – Today’s equivalent of factories • Data center networking – Regular topologies interconnecting VMs – Mix of Ethernet and IP networking • Modular, multi-tier applications – New ways of building applications – New performance challenges 33 Load Balancing 34 Load Balancers • Spread load over server replicas – Present a single public address (VIP) for a service – Direct each request to a server replica 10.10.10.1 Virtual IP (VIP) 192.121.10.1 10.10.10.2 10.10.10.3 35 Wide-Area Network Servers Router DNS Server DNS-based site 36 selection Data Centers Servers Router Internet Clients Wide-Area Network: Ingress Proxies Data Centers Servers Router Proxy Servers Router Proxy Clients 37