Holding slide prior to starting show Scheduling Parametric Jobs on the Grid Jonathan Giddy J.P.Giddy@wesc.ac.uk Parametric computation • Scientifically: – Study the behaviour of output variables against a range of different input scenarios • Computationally: – Execute an application multiple times, each time with a different combination of input parameters Why use the Grid? • Parametric computations – Require high performance computational resources – Require large numbers of computational resources – Generate large amounts of concurrency – Generate uncoupled computations – Tolerate high latencies Nimrod/G Deadline Cost Minimise Cost Jobs Budget Cost Increasing price Node 1 1 2 3 Node 2 4 5 6 Node 3 7 8 Node 4 Time 0 8 7 6 5 4 3 2 1 20 19 18 17 15 13 11 8 5 12 15 0 1 2 3 5 7 9 Minimise Time 8 7 6 5 4 3 2 1 0 Jobs 11 20 19 17 16 13 8 4 0 Budget 12 16 20 0 1 3 4 7 9 Cost 2.5 2.71 2.83 3.2 3.25 3.67 Budget / Job 4.0 Increasing price Node 1 1 3 Node 2 2 5 Node 3 4 6 Node 4 7 8 Time Globus 1.1 GRAM API int globus_gram_client_job_check( char *resource_manager_contact, const char *description, const float conf_percentage, globus_gram_client_time_t *estimate, globus_gram_client_time_t *interval) Note: This is not yet implemented • This function returns an estimate of the time it would take for a job of the description provided to reach an ACTIVE state. Historical profiling • Examine characteristics of all jobs in queue against historical profiles in order to determine expected start time of a job • Returns start time and error estimate Warren Smith, Ian T. Foster, Valerie E. Taylor: Predicting Application Run Times Using Historical Information. Job Scheduling Strategies for Parallel Processing Workshop (JSSPP) 1998: 122-142 Information Overload • Too many variables: – – – – – – – – – Number of CPUs CPU speed Processor architecture Operating system Real memory Disk speed Bandwidth Latency Other users Extrapolation of completion rate 1 hr 2 hr A 2 jobs/hr B 3 jobs/hr C 6 jobs/hr 80 20 Hour deadline 15 hour deadline 10 hour deadline 70 Average No. Processors 60 50 40 30 20 10 0 0 2.5 5 7.5 10 Time 12.5 15 17.5 20 Assumptions • Compute time >> Network time • All jobs are the same length on any particular resource • Price of a resource is constant over time • Not much wriggle room during the endgame – Both scheduling schemes push up against the limit that they’re not minimising – Heuristic nature of completion time What we really want… • Guaranteed completion time – globus_gram_client_job_check() with teeth – Requires scheduler to internally reserve space for job in advance • Advance reservation – As above, but with external interface And this too… • A real grid economy – Incentive for providers to provide resources – Incentive for consumers to describe requirements accurately – Incentive for consumers to use resources judiciously – Price mechanism • budget as a timely global information parameter • universally understood • enables trade-offs in making QoS decisions A final point • Optimising is really hard in a wide area network – Requires centralised decision maker – Information is missing – Information is not contemporaneous – Information is out-of-date Scalable information • …is slow to change • Budget and deadline are (relatively) constant and can be propagated far and wide in a timely manner • Slow information comes from specifying requirements in the real world • Satisfying (instead of optimising) a requirement is relatively simple – A resource can so it does – A resource can’t so it doesn’t