Senior Design Project: Parallel Task Scheduling in Heterogeneous Computing Environments Senior Design Presentation Senior Design Students: Christopher Blandin and Dylan Machovec Post-doctoral Scholar: Bhavesh Khemka Faculty Advisor: H. J. Siegel Outline motivation our system model problem statement existing work simulation details future work 2 Motivation High Performance Computing (HPC) used by wide variety of fields to solve challenging problems physics simulations, oil and gas industry, climate modeling, computational biology, computational chemistry, and many more improving performance increases productivity in these fields we plan on improving performance of system by designing novel scheduling techniques scheduling refers to the assignment and ordering of tasks to machines for execution 3 System Model – Definitions heterogeneity differing execution characteristics homogeneity have the same execution characteristics oversubscribed more tasks arriving than the system can execute immediately 4 System Model – Cluster Model clusters have multiple homogeneous nodes clusters are heterogeneous from each other nodes may have multiple multicore processors each node may only have one task running at a given time avoids interference between tasks task assignments are done at node-level a task cannot be spread across two clusters 5 System Model – Workload Characteristics dynamically arriving tasks when a task arrives, scheduler obtains the following information: arrival time execution time different times on different clusters (because of heterogeneity) number of processing cores required value function tasks are heterogeneous no pre-emption 6 System Model – Value Function each task has a value function represents value of the task when it completes value function may be different for each task monotonically decreasing functions value functions can be fully described with four parameters a constant starting value after soft deadline value decays linearly to a final value after hard deadline value drops to zero 7 Problem Statement we measure the performance of a scheduler in our environment as the sum of the value earned by completing tasks over a given amount of time goal of heuristics: maximize total sum of value earned over a given amount of time improve performance of HPC systems main contribution design, simulation, and analysis of resource allocation heuristics for task scheduling heterogeneous HPC system with multiple clusters tasks with associated value functions with soft and hard deadlines each task executes in parallel over multiple cores 8 Mapping Event mapping event: when task assignment decision(s) are made trigger mapping event whenever: a node becomes available, or a task arrives during mapping event, all tasks that have not been reserved or have not started execution are considered mappable only makes task assignments that can start now heuristic may or may not make reservations unmapped tasks set t7 t13 t3 t10 t9 t11 t2 t6 n3 nodes of cluster 1 n4 t5 t2 t8 time 9 n2 t4 t12 t4 n1 t1 n1 n2 nodes of cluster 2 current time Planned Heuristics four planned heuristics EASY Backfilling FCFS with Multiple Queues Max-Max Value Max-Max Value-Per-Resource submit to Metaheuristics International Conference (MIC 2015) submission deadline: 2/6/15 10 Existing Work – Dr. Siegel’s Group focuses on utility of tasks B. Khemka, R. Friese, L. D. Briceño, H. J. Siegel, A. A. Maciejewski, G. A. Koenig, C. Groer, G. Okonski, M. M. Hilton, R. Rambharos and S. Poole, “Utility Functions and Resource Management in an Oversubscribed Heterogeneous Computing Environment,” IEEE Transactions on Parallel and Distributed Systems, accepted 2014, to appear. another work that models stepped value functions J-K Kim, S. Shivle, H. J. Siegel, A. A. Maciejewski, T. D. Braun, et al. “Dynamically Mapping Tasks with Priorities and Multiple Deadlines in a Heterogeneous Environment,” Journal of Parallel and Distributed Computing, vol. 67, no. 2, pp. 154-169, Feb. 2007 11 Existing Work 12 other parallel task scheduling techniques EASY Backfilling D. A. Lifka, “The ANL/IBM SP Scheduling System,” Proc. First Workshop Job Scheduling Strategies for Parallel Processing, pp. 295-303, 1995. S. Gerald, R. Kettimuthu, A. Rajan and P. Sadayappan, “Scheduling of Parallel Jobs in a Heterogeneous Multi-Site Environment,” Job Scheduling Strategies for Parallel Processing, pp. 87-104, 2003. Design of Parallel Simulator for Experiments extends existing serial simulator from Dr. Siegel’s group modified to handle scheduling of parallel tasks created new modules cluster class has nodes within it methods for obtaining parallel task information from workload trace created a sleep task object to model idle time within each machine developed an algorithm to locate slots for parallel tasks within the area occupied by sleep tasks developed a method that picks the nodes that create the best packing (i.e., create the least future restrictions) 13 Workloads for Simulations will use Dr. Dror Feitelson’s Parallel Workload Trace to model the workload arrival workload log from Curie Supercomputer in France (has 93,312 cores) using last 10 months of data may use Downey’s model for execution time scaling 14 Future Work Use simulator to implement and compare the planned heuristics running a post-mortem analysis use a genetic algorithm to find a loose upper bound solution when we know in advance the arrival time and characteristics of all tasks since scheduling is NP-hard it is hard to quantify the performance of heuristics this analysis will give us a better metric to compare our results with 15 Thank You Questions? Feedback? 16 Back-up Slides 17 Packing Nodes Efficiently whenever an assignment is to be made, all heuristics pick the nodes that create the least amount of restrictions for future assignments e.g., if task t8 needs 3 nodes, it will be assigned: n1, n2, n5 t8 n1 t8 n2 n3 n4 t8 n5 time 18 current time Heuristics – Overview EASY Backfilling considers tasks in a first come first serve (FCFS) order makes only one reservation for the first task that cannot fit on idle machines backfills other tasks so that they do no delay the reservation FCFS with Multiple Queues puts the tasks in three queues takes 1, 4, and 8 tasks from the large, medium, and small queues respectively assigns tasks if possible, and otherwise makes the earliest reservation for them repeats until the queues are empty 19 Heuristics – Overview Max-Max Value First phase: Considering all tasks Determine the allocation choice that will earn it the highest value without delaying any place holder task If there are ties, pick the choice with the earlier completion time Second phase: Consider tasks from first phase Make assignment or a place-holder for the choice that earns the highest value This assignment should not start execution after the start of the earliest place holder task Repeat the two phases until no more tasks can be mapped Max-Max Value-Per-Resource Similar to Max-Max Value 20 Simulation Study to model real-world system environment experiments run on ISTeC Cray HPC System uses real workload traces as inputs 21