Low-Latency Datacenters John Ousterhout The Datacenter Revolution Phase 1: Scale Phase 2: Low Latency ● How to use 10,000 servers for ● Latencies dropping a single application? dramatically ● New storage systems: § Bigtable, HDFS, ... ● Network round-trips: § 500 µs → 2.5 µs? ● New models of computation: § MapReduce, Spark, ... ● Storage: § 10 ms → 1 µs? ● But, latencies high: § Network round-trips: 0.5 ms § Disk: 10 ms ● Potential: new applications (collaboration?) ● Challenge: need a new software stack ● Interactive apps can’t access much data April 12, 2016 Low-Latency Datacenters Slide 2 Network Latency (Round Trip) Component 2010 Switching fabric Software NICs Propagation delay Total Possible Today 5-10 Years 100-300 µs 5 µs 0.2 µs 50 µs 2 µs 1 µs 8-128 µs 3 µs 0.2 µs 1 µs 1 µs 1 µs 200-400 µs 11 µs 2.4 µs (Within a datacenter, 100K servers) April 12, 2016 Low-Latency Datacenters Slide 3 Storage Latency Disk 5–10 ms Flash 50–500 µs Nonvolatile memory (e.g. 3D XPoint) April 12, 2016 Low-Latency Datacenters 1–10 µs Slide 4 Low-Latency Software Stack ● Existing software stacks highly layered § Great for software structuring § Layer crossings add latency § Slow networks and disks hide software latency ● Can’t achieve low latency with today’s stacks § Death by a thousand cuts § Networks: ● Complex OS protocol stacks ● Marshaling/serialization costs § Storage systems: ● OS file system overheads Need significant changes to software stacks April 12, 2016 Low-Latency Datacenters Slide 5 Reducing Software Stack Latency 1. Optimize layers (specialize?) High Latency 2. Eliminate layers 3. Bypass layers April 12, 2016 Low-Latency Datacenters Slide 6 The RAMCloud Storage System ● New class of storage for low-latency datacenters: § All data in DRAM at all times § Low latency: 5-10µs remote access § Large scale: 1000-10000 servers 1000 – 100,000 Application Servers Appl. Appl. Appl. Library Library Library ● Durability/availability equivalent April 12, 2016 Appl. Library Datacenter Network to replicated disk ● 1000x improvements in: § Performance § Energy/op (relative to disk-based storage) … Master Master Master Backup Backup Backup Coordinator … Master Backup 1000 – 10,000 Storage Servers Low-Latency Datacenters Slide 7 Thread Scheduling ● Traditional kernel-based thread scheduling is breaking down: § Context switches too expensive § Applications don’t know how many cores are available (Can’t match workload concurrency to available cores) § Kernel may preempt threads at inconvenient points ● Fine-grained thread scheduling must move to applications § Kernel allocates cores to apps over longer timer intervals § Kernel asks application to release cores Application Thread Scheduling Operating System ● Arachne project: core-aware thread scheduling § Partial design, implementation beginning § Initial performance result: 9ns context switches! April 12, 2016 Low-Latency Datacenters Slide 8 New Datacenter Transport ● TCP optimized for: § Throughput, not latency § Long-haul networks (high latency) § Congestion throughout § Modest # connections/server Top-of-rack switches Datacenter Network ... ● Future datacenters: § High performance networking fabric: ● Low latency ● Multi-path Servers § Congestion primarily at edges § Many connections/server (1M?) Congestion at edges (host-TOR links) Need new transport protocol April 12, 2016 Low-Latency Datacenters Slide 9 Homa: New Transport Protocol ● Greatest obstacle to low latency: § Congestion at receiver’s link § Large messages delay small ones ● Solution: drive congestion control from receiver § Schedule incoming traffic § Prioritize small messages § Take advantage of priorities in network ● Implemented at user level § Designed for kernel bypass, polling-based approach ● Status: § Evaluating scheduling algorithms via simulation April 12, 2016 Low-Latency Datacenters Slide 10 Conclusion ● Interesting times for datacenter software ● Revisit fundamental system design decisions ● Exploring from several different angles ● Will the role of the OS change fundamentally? April 12, 2016 Low-Latency Datacenters Slide 11 New Platform Lab Create the next generation of platforms to stimulate new classes of applications Platforms Large Systems February 24, 2016 Collaboration Platform Lab Introduction Slide 12 Platform Lab Faculty Bill Dally Nick McKeown February 24, 2016 Sachin Katti Christos Kozyrakis Phil Levis John Ousterhout Guru Parulkar Mendel Rosenblum Faculty Director Executive Director Platform Lab Introduction Keith Winstein Slide 13 Theme: Swarm Collaboration Infrastructure Wired/Wireless Networks Next-Generation Datacenter Clusters (Cloud/Edge) February 24, 2016 Device Swarms Platform Lab Introduction Slide 14 Platform Lab Affiliates February 24, 2016 Platform Lab Introduction Slide 15 Questions/Comments? April 12, 2016 Low-Latency Datacenters Slide 16 Does Low Latency Matter? Potential: enable new data-intensive applications ● Application characteristics § Collect many small pieces of data from different sources § Irregular access patterns § Need interactive/real-time response ● Candidate applications § Large-scale graph algorithms (machine learning?) § Collaboration at scale April 12, 2016 Low-Latency Datacenters Slide 17 Large-Scale Collaboration “Region of Consciousness” Gmail: email for one user Facebook: 50-500 friends Morning commute: 10,000-100,000 cars Data for one user April 12, 2016 Low-Latency Datacenters Slide 18