Coordinated Workload Scheduling A New Application Domain for Mechanism Design Elie Krevat Introduction Distributed systems becoming larger, more complex Nodes perform computation and storage tasks Workloads enter system and are distributed across nodes Clients run many workloads, can pay for resources Nodes service many workloads (not dedicated) System provides QoS guarantees: Performance – load balance workloads to faster free nodes Efficiency – minimize cycles wasted when tasks available Fairness – nodes share resources across workloads Benefits of Shared Storage Why cluster? Scaling, cost, and management. Why share? Slack sharing, economies of scale, uniformity. Throughput Performance Insulation in Shared Storage Each of n workloads on a server: Executes efficiently within its portion of time (timeslice) Ideally: gets ≥ 1/n of its standalone performance In practice: within a fraction of the ideal Argon project [Wachs07] provides bounds on efficiency across workloads for one server Problems extending to many servers (cluster-style) Synchronized workloads need coordination of schedules Performance of system limited by slowest node Timeslice challenges 140 ms Workload 1 Workload 2 Workload 3 Server A Workload 1 Workload 2 280 ms Workload 4 Server B Workload 1 Workload 4 100 ms 1 6 3 2 5 1 Server C 6 3 Cluster-style Storage Systems Data Block Synchronized Read 1 R R R R 2 3 Client 1 2 Switch 3 Data Fragment 4 4 Client now sends next batch of requests Storage Servers 6 Environment Assumptions One client per workload Bounded number W of workloads, N of nodes Constant set of workloads to be scheduled But mechanism might support changing set Communication doesn’t interfere with computation/storage tasks Workload Distribution Settings Two alternative workload distribution settings Setting I: Free Workload Assignment Workloads can be freely assigned to many nodes Example: Embarrassingly parallel distributed apps Problem: Determine best set of nodes to assign Setting II: Fixed Workload Assignment Workloads must be assigned to fixed set of nodes Example: Cluster-style storage Problem: Coordinate responses of nodes with better timeslice scheduling Computing Environments with Monetary Incentives Workloads pay for resources: Weather forecasting Seismic measurement simulations of oil fields Distributed systems sell resources Supercomputing centers sell resources Shared infrastructures Grid computing Individually-owned computers sell spare cycles SETI@Home for $$ May not have single administrative domain Why Mechanism Design? Central coordinator(s) lack per-node information Enforce cooperation and global QoS Different performance capabilities and revenue models Efficiency and fairness not always goals of players Reduce scheduling problems to general mechanism Scheduling coordinated workloads is hard (proof later) Divide scheduling problems across nodes Design mechanism to produce coordination Outline Background and Motivation Mechanism I: Free Workload Assignment Mechanism II: Fixed Workload Assignment Conclusions Revenue Model: Free Assignment Clients pay nodes directly after task Clients may also pay fixed cost to central scheduler Workloads want the best and fastest nodes Central scheduler doesn’t know load/speed of nodes Nodes are greedy and want lots of workloads Payment is per-workload Amount depends on many factors: Speed of response Number of requests/computations per timeslot May lie about load/speed if asked directly System Goal: Assign workloads to nodes that will respond fastest Mechanism Design: VCG Run auction to decide which M nodes to assign Nodes respond with bids FIFO approach to scheduling each workload Can also run combinatorial auction on bundles Valuations depend on speed and current load Same factors that affect final payment Apply Vickrey-Clarke-Groves mechanism First auction iteration finds top M bids Remove Node X, recompute top M bids Additional auction iteration not actually necessary Difference between X’s bid and M+1st bid is payment May also normalize payments to share wealth over nodes Mechanism Results Incentive compatible Nodes have no incentive to lie, since if they over-report valuation for workload they’ll still be paid true valuation Global efficiency (i.e., best allocation for workload) Related to general task allocation problem [Nisan99] k tasks allocated to n agents Goal is to minimize completion time of last assignment (make-span) Valuation of agent is negation of total time spent on tasks Approximation/randomized algorithms exist for CA Outline Background and Motivation Mechanism I: Free Workload Assignment Mechanism II: Fixed Workload Assignment Conclusions Revenue Model: Fixed Assignment Nodes paid by system at every timestep System wants quick resolution of workload requests Nodes need monetary incentives to schedule fully coordinated workloads efficiently and fairly Payment is part of mechanism payment scheme All M nodes service workload in same timeslice Uncoordinated workloads not important System Goals: Enforce coordination of workloads per timeslice Achieve fair distribution of resources Achieve efficient schedule allocations Coordination is hard Wkld 1 2 3 4 1 2 3 4 Node A B C D Wklds 1,4 1,2,4 2,3 3,4 Reduce Max Independent Set problem to problem of scheduling max # of fully coordinated workloads per timeslice Nodes A,B B,C C,D A,B,D For every Node xi that services a workload wi, then wi has a dependency edge to all other workloads serviced by xi NP-Complete, but approximation algorithms exist For above example, max independent set is {1,3} Properties of Schedule Allocations Two types of schedule allocations Set of workloads serviced by node nx is Sx # / timeslice quanta allocated per workload wi is qi Basic quanta timeslice allocation a Longer sequence of timeslices atot Total quanta count Qx for each node nx Delay between consecutive workload schedules in allocation atot is schedule distance di,k k refers to schedule instance in atot Average schedule distance di,avg, per-node is dx,avg Maximum schedule distance di,max, per-node is dx,max Formulas for Schedule Allocation Properties Possible Payment Scheme Node is paid max of P credits for each scheduled time quanta No credits for uncoordinated schedule For every cycle of time that workload isn’t scheduled, payment decreases by c (c << P) Node is fined F if starves workload over a period of quanta greater than Qthr Using derived properties of schedule allocations, each node calculates payments Mechanism Design: Open Research Problem Goal is to improve efficiency and fairness But coordination is hard optimization problem Nodes compute their best allocations (through heuristics) using payment scheme that rewards efficiency/fairness Send valuations to central scheduler General mechanism determines best global allocation May be better suited only for central scheduler Expected properties of a mechanism: Nodes are players No additional utility past payments? Auctioned good may be single or total allocations Tradeoff is ability to adapt to changing workloads vs. better assessment of efficient allocations over longer time Outline Background and Motivation Mechanism I: Free Workload Assignment Mechanism II: Fixed Workload Assignment Conclusions Conclusions Distributed systems environments provide new applications for mechanism design Model and analysis of 2 different distribution settings Goals of better global performance, efficiency, fairness Not always shared by individual nodes Free workload assignment solved with VCG Fixed workload assignment still open problem Revenue model and goals of mechanism vary Payment functions use derived allocation properties Coordination of workloads is hard optimization problem Motivation for further research in related areas