Data Structures of the Future: Concurrent, Optimistic, and Relaxed Dan Alistarh ETH Zurich Background • The First Graph Problem (Euler, 1735) • 4 vertices, 7 edges • Graph Problems, circa 2010 The Bridges of Königsberg • Social Graphs: 1 billion vertices, 100 billion edges (~1 TB storage) • Graph Problems Today • Web/Office/Brain Graphs: 100 billion vertices, 100 trillion edges • More than 1 PetaByte storage We distribute computation across processors, computers, and data centers. This changes the way data structures are designed, built, and deployed. Why Concurrent? To get speedup on newer hardware. Scaling: more threads should imply more useful work. The Problem with Concurrency Throughput (Events/Second) Throughput of Parallel Event Processing 6,00E+06 Event Queue 5,00E+06 > $10000 / machine 4,00E+06 3,00E+06 event3 2,00E+06 1,00E+06 0,00E+00 0 < $1000 /20 10 machine 30 40 50 60 70 Number of Threads Concurrency can be very bad value for money. Is this problem inherent? event2 event1 Inherent Sequential Bottlenecks Data structures with strong ordering semantics • Stacks, Queues, Priority Queues, Exact Counters Theorem: Assuming n threads, any deterministic, strongly ordered data structure has an execution in which a processor takes linear in n time to return. [Alistarh, Aspnes, Gilbert, Guerraoui, Journal of the ACM, 2014] This is important because of Amdahl’s Law • Assume single-threaded computation takes 1 week • Inherently sequential component (e.g., takes 15% = 1up day To get performance, it isqueue) critical to speed • Then maximum speedup < 7x, even infinite threads shared datawith structures. Concurrent Data Structures Algorithms, data structures, and architectures for scalable distributed computation. Theory ↔ Software ↔ Hardware New Algorithmic and Analytic Ideas. New Hardware Designs! New Data Structures! Discrete Event Simulation Search(key) 11 task 1 Insert/Delete(k, v) 18 task DeleteMin() 15 3 task 4 task 7 Priority Queue task<key, 8 5 task value> task task task Extremely useful: • Graph Operations (Shortest Paths) • Operating System Kernel • Time-Based Simulations We are looking for a fast concurrent Priority Queue. Methods: • Get Top Task • Insert a Task • Search for Task The Problem Target: fast, concurrent Priority Queue. Lots of work on the topic: [Sanders97], [Lotan&Shavit00], [Sundell&Tsigas07], [Linden&Jonsson13], [Lenhart et al. 14], [Wimmer et al.14] Current solutions are hard to scale: DeleteMin is highly contended. Everyone wants the same element! Concurrent Solution ● Linked list, sorted by priority ● Each node has random “height” (geometrically distributed with parameter ½) ● Elements at the same height form their own lists head H 1 3 4 5 9 … T Concurrent Solution: the SkipList [Pugh90] ● ● ● ● Linked list, sorted by priority Each node has random “height” (geometrically distributed with parameter ½) Elements at the same height form their own lists Average time Search, Insert, Delete logarithmic, work concurrently [Pugh98, Fraser04] tail head Search( 5 ) [H, 9] [H, 9] [1, 9] [5, 9] stop ! H 1 3 4 5 9 … T The SkipList as a PQ ● DeleteMin: simply remove the smallest element from the bottom list ● All processors compete for smallest element ● Does not scale! head I. Lotan and N. Shavit. Skiplist-Based Concurrent Priority Queues. 2000. tail The Idea: Relax! ● We want to choose an item at random with ‘good’ guarantees ● Minimize loss of exactness by only choosing items near the front of the list ● Minimize contention by keeping collision probability low DeleteMin: The Spray [Alistarh, Kopinsky, Li, Shavit, PPoPP 2015] procedure Spray() ● At each skiplist level, flip coin to stay or jump forward ● Repeat for each level from log n down to 1 (the bottom) ● As if removing a random priority element near the head jump stay jump jump Two examples for starting height 4 Spray and pray? SprayList Probabilistic Guarantees ✓ Maximum value returned by Spray has rank Õ(𝑛) ‐ Sprays aren’t too wide ✓ For all x, p(x) = Õ(1/𝑛) ‐ Sprays don’t cluster too much ✓ If x > y is returned by some Spray, then p(y) = Ω(1/𝑛) ‐ Elements do not starve in the list Õ(𝑛) p(x) = probability that a spray returns value at index x The Benchmark • Discrete Event Simulation • Exact algorithms have negative scaling after 8 threads • SprayList competitive with the random remover (no guarantees, incorrect execution) In many practical settings (Discrete Event Simulation, shortest paths), priority inversions are not expensive. DeleteMin: The Spray node* DeleteMin ( ) { cur <- head; //starting node i <- log n; //starting height while (i > 0) { repeat(rand(0, 1)) { //stay or skip? cur <- cur->next[i]; }; i <- i-1; //decrease level } v <- cur->val //reached bottom flag <- Compare-and-Swap ( cur->val, v, NULL ) //acquire node if ( flag == SUCCESS ) return cur; else RETRY } The SprayList relaxes progress as well! Relaxed Data Structures The data structures of our childhood are changing. The SprayList merges both relaxed semantics and optimistic progress to achieve scalability. A relaxation renaissance [KarpZhang93], [DeoP92], [Sanders98], [HenzingerKPSS13], [NguyenLP13], [WimmerCVTT14], [LenhartNP15], [RihaniSD15], [JeffreySYES16] My Research Algorithms, data structures, and architectures for scalable distributed computation. Theory ↔ Software ↔ Hardware Algorithms and Data Structures. Architectures and Systems. Applications. Interested? Internship / Master / PhD. What’s Next? Algorithms, data structures, and architectures for scalable distributed computation. Theory ↔ Software ↔ Hardware Algorithms and Data Structures. Next-Generation Data Structures Population Protocols Architectures and Systems. Low-Latency Transactional Systems Optimization for the Cloud Applications. Large-Scale Graph Processing Distributed Machine Learning Backup: SprayList Shortest Paths DeleteMin: The Spray node* DeleteMin ( ) { cur <- head; //starting node i <- log n; //starting height while (i > 0) { repeat(rand(0, 1)) { //stay or skip? cur <- cur->next[i]; }; i <- i-1; //decrease level } v <- cur->val //reached bottom flag <- Compare-and-Swap ( cur->val, v, NULL ) //acquire node if ( flag == SUCCESS ) return cur; else RETRY } Parameters in red can be tuned! The Scheduler (Intel™ machine, single socket) The Stochastic Scheduler Model • Short version: Every (non-faulty) thread can be scheduled in each step, with probability > 0. • Definition: A scheduler is a triple (Dt, At, θ)t > 0, where Dt is the distribution at time t, At is the active set at time t, and θ is the probability threshold, such that • At time t, scheduling probabilities are given by Dt • Only processes in At can be scheduled at t • Each process in At is scheduled with probability ≥ θ. A scheduler is stochastic if θ > 0. * Lottery OS Scheduling, e.g. [Petrou, Milford, Gibson] Examples • Assume n processes • The uniform stochastic scheduler: • θ=1/n • Each process gets scheduled uniformly • A standard shared-memory adversary: • Take any adversarial strategy • Take Dt to give probability 1 to the process picked by the strategy, 0 to all others • Not stochastic • Quantum-based schedulers also easily modeled